I am looking for a way to speed up calculating byte frequency in a stream. Obvious is reading a byte and using it as index to a 0..255 array to increment a counter. In old times we used the xlat opcode

I know the stuff about mmx,sse and avx. But opcodes repertories in modern intel are huge.

Do anyone knows a specific opcode or set of them able to, for example, read 4 or 8 bytes to a register and optimize byte counting in parallel?

Thanks guys!