Hi guys, it is the first thread I open, and I'd like to introduce you to a simple project, the first version of the Bench-Entropy.
The app natively incorporates the software ENT (Pseudorandom Number Sequence Test Program) performs various statistical tests, providing output in the following report:
1) Entropy
The density of information contained in a file expressed in number of bits per character. The maximum entropy is 8, when we find a file with entropy or 8 means that it is perfectly random, or is compressed.
In fact, taken a bitmap, its entropy is 4.724502 bits per byte, if you turned it into jpeg becomes 7.938038 bits per byte. If you compress the bmp with winrar I get 7.996259 bits per byte. This is clear.
I take a text file where I have written a thousand times the same word we get is that its entropy Entropy = 2.545246 bits per byte. If we compress with winrar we get Entropy = 6.747827, and winrar maximum compression Entropy = 6.756800.
2) Test of Chi square
Used for the study of random data streams. If we apply it to image files as a result they are random data.
In practice, it occupies the deviation percentage of the flow of data from a real random sequence.
However, if the result is> 99% or <1% of the data stream is not random. If it is between <5% and> 95% of the flow is random suspiciously, if intermediate then we are on random.
3) Arithmetic Mean
Sum all the bytes and divides them for the length: it is a type of arithmetic mean. The closer the number is 127.5 more random.
4) Test of Pi-Greco Montecarlo
The more the value is close to pigreco (3.14 ..) plus the data stream is random / compressed.
5) Coefficient of Correlation
Ie how predictable a byte knowing his previous. More the value is close to 1 and more is predictable, more and more close to 0 is random.
( But all these things definitely already did You Know ).
![]()
Afterwards I embedded the LZ-Bench of Inikep, I thank him for his work and for having changed part of its code to fit the use of the app.
In LZBench no size file limit, even using a low amount of memory the average of the ratio is calculated for the number of divided parts obtaining the overall result of the compression ratio and reduced size.
Separately for those interested I attach the file "arc.groups" of Bulat, which is used as reading during scanning, enabling other two sections in the app: "The classification of files based on read of the extensions in arc.groups, and the estimated creation of masked methods based on the entropy of scanned files".
I do not speak English so there may be misunderstandings of the text, I hope we understand me, or for any advice on improving the application are at your disposal.
> BE_v1.0 Bench_Entropy.7z <
> Arc.groups arc.7z <