Hello all.
I'm a long time reader of this forum, first time poster, so I sort of feel like I already know most of you just from reading your posts over the years.
I have an interesting problem in that I want to identify the various methods being used to compress data in a target chip I am examining. This chipset is undocumented and therefore I'll be treating it as a black box, but it presented an interesting problem and I was curious in your thoughts as to a potential solution.
What I am thinking of is that various compression schemes in use have specific strengths and weaknesses (ie: they compress specific data types in specific ways). So I thought that it should be possible to create a set of source data that when presented to various compression engines will produce an output that if not directly gives away, at the very least, will strongly suggest that a specific type of algorithm is being used.
So that's the crux of it: What kinds of data would be proposed to create a kind of generic corpus that could be provided to various compression engines and by examining the output and various other side channels (slowdown of output, heat generated by MCU, etc) would help me to determine what methods and are being used?
Obviously, streams with various length runs of repeated symbols (be they bits or bytes) could be used to detect simple RLE, but a more complex incremental count being compressed better would indicate that some form of ADE was in use. Then simply extending this to include psuedo-random data streams with injected length runs of repeated runs at various intervals and lengths to detect sliding window LZW and such.
The chip in question is doing real-time compression of data and uses an MCU core without any DSP capability, has limited buffer memory in the 10's of Megabytes range, and its clock speed is limited to <500Mhz - so it isn't going to be anything super complex or advanced like a BWT or context mixing.
Do any of you have any suggestions of generating data to present to such an unknown engine that will assist me in detecting other basic compression mechanisms?
Thanks,
Luntik.