# Thread: Pseudorandom Number Sequence Test + Benchmark Compressors

1. ## Pseudorandom Number Sequence Test + Benchmark Compressors

Hi guys, it is the first thread I open, and I'd like to introduce you to a simple project, the first version of the Bench-Entropy.

The app natively incorporates the software ENT (Pseudorandom Number Sequence Test Program) performs various statistical tests, providing output in the following report:

1) Entropy

The density of information contained in a file expressed in number of bits per character. The maximum entropy is 8, when we find a file with entropy or 8 means that it is perfectly random, or is compressed.
In fact, taken a bitmap, its entropy is 4.724502 bits per byte, if you turned it into jpeg becomes 7.938038 bits per byte. If you compress the bmp with winrar I get 7.996259 bits per byte. This is clear.
I take a text file where I have written a thousand times the same word we get is that its entropy Entropy = 2.545246 bits per byte. If we compress with winrar we get Entropy = 6.747827, and winrar maximum compression Entropy = 6.756800.

2) Test of Chi square

Used for the study of random data streams. If we apply it to image files as a result they are random data.
In practice, it occupies the deviation percentage of the flow of data from a real random sequence.
However, if the result is> 99% or <1% of the data stream is not random. If it is between <5% and> 95% of the flow is random suspiciously, if intermediate then we are on random.

3) Arithmetic Mean

Sum all the bytes and divides them for the length: it is a type of arithmetic mean. The closer the number is 127.5 more random.

4) Test of Pi-Greco Montecarlo

The more the value is close to pigreco (3.14 ..) plus the data stream is random / compressed.

5) Coefficient of Correlation

Ie how predictable a byte knowing his previous. More the value is close to 1 and more is predictable, more and more close to 0 is random.

( But all these things definitely already did You Know ).

Afterwards I embedded the LZ-Bench of Inikep, I thank him for his work and for having changed part of its code to fit the use of the app.

In LZBench no size file limit, even using a low amount of memory the average of the ratio is calculated for the number of divided parts obtaining the overall result of the compression ratio and reduced size.

Separately for those interested I attach the file "arc.groups" of Bulat, which is used as reading during scanning, enabling other two sections in the app: "The classification of files based on read of the extensions in arc.groups, and the estimated creation of masked methods based on the entropy of scanned files".

I do not speak English so there may be misunderstandings of the text, I hope we understand me, or for any advice on improving the application are at your disposal.

> BE_v1.0 Bench_Entropy.7z <

> Arc.groups arc.7z <

2. How you define entropy is a challenging topic.

If it's just from the frequency of all the bytes divided by the total number of bytes (via the classic Shannon information theory) then we'll get the entropy of a stream taken as order-0; no correlations between symbols are considered, just the frequencies of them. Consider 256 bytes with byte values 0 to 255 in series. It's highly compressable, but every one of the 256 possible values occurs 1/256th of the time, giving 8 bits per byte of entropy. Thus I tend to also look at the order-1 entropy to see if there is any immediate correlation with the preceeding byte.

That's not perfect either of course, but it can be useful starting point for analysis.

I attach a trivial entropy calculation for 8-bit, 16-bit and order-1 8-bit quantities.

3. The ent applies to various tests to sequences of bytes stored in files and reports the results of these tests. The program is useful for evaluating pseudorandom number generators for encryption and statistical sampling applications, compression algorithms, and other applications where the information density of a file that interests.
I do not know if you've tried various SYNOPSIS, but in addition to the option F: "Bend your upper case to lower case before calculating statistics. Folding is made in accordance with ISO 8859-1 Latin-1 character set, with accented letters properly processed. "
We also have the option C: That prints a table 0 to 255 the number of occurrences of each possible bytes (or bits, if the -b option is also specified) the amount and the fraction of global file consists of that value . printable characters in the ISO 8859-1 Latin-1 character set are displayed along with their values ​​in decimal bytes. In output mode not concise values ​​with zero occurrences are not printed.
And finally, the option B: The input is treated as a stream of bits instead of 8-bit bytes. Reported statistics reflect the properties of the bitstream

It remains a statistical calculation, and not exact certainty for each scanned files, but without going too specific, of tests performed, I think it's a good tool to examine the structure of a file and see if its compression ratio will be high or low.

4. Bench_Entropy.7z - MALWARE DETECTED!

I´d like to report malware detection - "Win32/Upatre" in file Bench_Entropy.7z.
I´ve using AVAST FREE Antivirus (1 year outdated version) with the latest virus definition database.

I´m 100% sure it´s the totally false positive and the file isn´t harmful. The same happened when I´ve tested PerfectCompress archive with many PAQ algorithms - every PAQ executable was "infected".

Better to investigate source code rather than solving malware
problems, although the file isn´t harmful, I suppose.

As for the prevention, It´ll be good if the archive would be password protected. Therefore users could unpack archive and verify it yourself rather than blocking download of the archive from encode.su due to "viruses".

Thanks.
CompressMaster

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•