I've wished for decades I had a hex dumper that made various features of data pop out visually, so you could easily see strides, text-related stuff, and simple numeric regularities, but I never got around to writing one. The other day I realized how easy it could be if I just generate rudimentary html and let the browser handle the rendering, so I wrote a little Python program to do it, which is just a few dozen lines. It turns out to be even easier to use and cooler than I expected.
I call it atxd, which might stand for what I said in the subject line, or the Austin, Texas heX Dumper, or something else. It can't stand for the Amazing Technicolor heX Dumper, because Technicolor is a valuable trademark owned by Technicolor SA.
Anyhow, the basic shtik is to display each byte of a file as hex, like in a normal hex dumper, but to encode information about that value in two ways. The color of the little box visually indicates the numeric value of the byte, with white for 0, black for 255, and grays for values in between.
The color of the hex characters for a byte tells you other things, namely:
(1) whether it's a common x86 opcode value (green)
(2) whether it's a space or an English letter (red) or digit (orange), or
(3) something else, in which case it's usually blue.
You can vary the line width in two ways, and that's very useful. If you pick a line width that's a multiple of several common strides, and you have strided data, the stride is usually apparent---e.g., if your line width is 60, strides of 2, 3, 4, 5, 6, 12, 15, 20, 30, and 60 will usually show up clearly (assuming they go on long enough) because the pattern will wrap around and line up under itself.
You can also usually see strides that are a little off from those, because you get diagonals. Something that repeats at an interval that's a little less than 60 will wrap around and fall just short of lining up under itself, so it will move left, and something that's a bit to long will pass itself and line up under itself and to the right. When you see how many spaces the pattern is from being lined up vertically, you can figure out the stride.
Here's an example, using sao, the star catalogue file from the Silesia Corpus. I picked a stretch of a few thousand KB at random,
and plotted it with a line length (picture stride) of 60, my usual default for a first try. It looked like this:
At first I didn't see the stride, but after I looked at it a bit, I thought I could see some streakiness going downward to the left at a shallow angle, maybe a few bytes to the left each row down, which would indicate some stridiness in the mid-50's. I took off my glasses and backed up a few paces from my monitor, and it seemed a bit clearer---light bluish streaks with beige edges. (It may be clearer in the thumbnail above than it was on the screen full-size.)
I thought maybe that would be clearer at a shorter stride, so I tried 30, which is very easy---all you have to do is narrow your browser window until all the lines wrap in the very center. (The hexdump is just an html text file with a fixed-width font environment, so it wraps perfectly.)
That definitely made it clearer
Now we have light-colored bluish streaks going down and to the left about two bytes per line. Looking closely, you can find values that repeat exactly that way, over and over (e.g., hex 15). So our stride clearly is two less than the picture stride of 30, i.e., 28.
Rerunning atxd with a line length of 28, I got this:
From that picture you can see a lot of the same regularities I mentioned in an earlier thread, which I'd found using fv and simple filters. Either way, you can make a good guess that you're looking at mostly little-endian numbers, where the columns representing the high bytes of numbers are less variable than the ones representing the low bytes. This looks like 28-byte records that starts with two 8-byte numbers LSB first, the second one more variable than the first and ends with two 4-byte numbers, also LSB first. In between there are 4 more bytes to figure out.
With the dumper, the first two of those are very easy---a bit to the right of the midline of the picture you have one column of red numbers on lightish gray, suggesting capital letters (lower-case letter boxes are darker because of their higher codes). Just to the right of it, you have another column of light gray, but with yellow hex digits, suggesting characters. Looking closely, you can see that the letter field values are all from 41 to 4B, or A-K. (At least in this sample.) The digits are hard to read because they're yellow on a similar gray---I need to change the color scheme a little, I guess---but they're within the range of 0-9 or they wouldn't all be yellow.
The other two columns to the right of that, are likely a two-byte number, LSB first, so that the first one varies way more than the second one (the high byte).
There's more to glean from this picture---e.g., noticing that when the slowly-changing columns of the first 8 byte number change, they always go up, suggesting that it's monotonically increasing, and likely the primary sort key for the data set.
Next I'll try examining some machine code.