Results 1 to 4 of 4

Thread: Pseudorandom Number Sequence Test + Benchmark Compressors

  1. #1
    Member Samantha's Avatar
    Join Date
    Apr 2016
    Thanked 7 Times in 4 Posts

    Lightbulb Pseudorandom Number Sequence Test + Benchmark Compressors

    Hi guys, it is the first thread I open, and I'd like to introduce you to a simple project, the first version of the Bench-Entropy.

    The app natively incorporates the software ENT (Pseudorandom Number Sequence Test Program) performs various statistical tests, providing output in the following report:

    1) Entropy

    The density of information contained in a file expressed in number of bits per character. The maximum entropy is 8, when we find a file with entropy or 8 means that it is perfectly random, or is compressed.
    In fact, taken a bitmap, its entropy is 4.724502 bits per byte, if you turned it into jpeg becomes 7.938038 bits per byte. If you compress the bmp with winrar I get 7.996259 bits per byte. This is clear.
    I take a text file where I have written a thousand times the same word we get is that its entropy Entropy = 2.545246 bits per byte. If we compress with winrar we get Entropy = 6.747827, and winrar maximum compression Entropy = 6.756800.

    2) Test of Chi square

    Used for the study of random data streams. If we apply it to image files as a result they are random data.
    In practice, it occupies the deviation percentage of the flow of data from a real random sequence.
    However, if the result is> 99% or <1% of the data stream is not random. If it is between <5% and> 95% of the flow is random suspiciously, if intermediate then we are on random.

    3) Arithmetic Mean

    Sum all the bytes and divides them for the length: it is a type of arithmetic mean. The closer the number is 127.5 more random.

    4) Test of Pi-Greco Montecarlo

    The more the value is close to pigreco (3.14 ..) plus the data stream is random / compressed.

    5) Coefficient of Correlation

    Ie how predictable a byte knowing his previous. More the value is close to 1 and more is predictable, more and more close to 0 is random.

    ( But all these things definitely already did You Know ).

    Click image for larger version. 

Name:	Ent_bmp.png 
Views:	127 
Size:	66.1 KB 
ID:	4861 Click image for larger version. 

Name:	Ent_jpg.png 
Views:	100 
Size:	67.7 KB 
ID:	4862

    Afterwards I embedded the LZ-Bench of Inikep, I thank him for his work and for having changed part of its code to fit the use of the app.

    In LZBench no size file limit, even using a low amount of memory the average of the ratio is calculated for the number of divided parts obtaining the overall result of the compression ratio and reduced size.

    Click image for larger version. 

Name:	LZ_Bench_Test2.png 
Views:	88 
Size:	68.5 KB 
ID:	4867

    Separately for those interested I attach the file "arc.groups" of Bulat, which is used as reading during scanning, enabling other two sections in the app: "The classification of files based on read of the extensions in arc.groups, and the estimated creation of masked methods based on the entropy of scanned files".

    I do not speak English so there may be misunderstandings of the text, I hope we understand me, or for any advice on improving the application are at your disposal.

    > BE_v1.0 Bench_Entropy.7z <

    > Arc.groups arc.7z <

  2. #2
    Join Date
    Dec 2011
    Cambridge, UK
    Thanked 187 Times in 128 Posts
    How you define entropy is a challenging topic.

    If it's just from the frequency of all the bytes divided by the total number of bytes (via the classic Shannon information theory) then we'll get the entropy of a stream taken as order-0; no correlations between symbols are considered, just the frequencies of them. Consider 256 bytes with byte values 0 to 255 in series. It's highly compressable, but every one of the 256 possible values occurs 1/256th of the time, giving 8 bits per byte of entropy. Thus I tend to also look at the order-1 entropy to see if there is any immediate correlation with the preceeding byte.

    That's not perfect either of course, but it can be useful starting point for analysis.

    I attach a trivial entropy calculation for 8-bit, 16-bit and order-1 8-bit quantities.
    Attached Files Attached Files

  3. #3
    Member Samantha's Avatar
    Join Date
    Apr 2016
    Thanked 7 Times in 4 Posts
    The ent applies to various tests to sequences of bytes stored in files and reports the results of these tests. The program is useful for evaluating pseudorandom number generators for encryption and statistical sampling applications, compression algorithms, and other applications where the information density of a file that interests.
    I do not know if you've tried various SYNOPSIS, but in addition to the option F: "Bend your upper case to lower case before calculating statistics. Folding is made in accordance with ISO 8859-1 Latin-1 character set, with accented letters properly processed. "
    We also have the option C: That prints a table 0 to 255 the number of occurrences of each possible bytes (or bits, if the -b option is also specified) the amount and the fraction of global file consists of that value . printable characters in the ISO 8859-1 Latin-1 character set are displayed along with their values ​​in decimal bytes. In output mode not concise values ​​with zero occurrences are not printed.
    And finally, the option B: The input is treated as a stream of bits instead of 8-bit bytes. Reported statistics reflect the properties of the bitstream

    It remains a statistical calculation, and not exact certainty for each scanned files, but without going too specific, of tests performed, I think it's a good tool to examine the structure of a file and see if its compression ratio will be high or low.

  4. #4
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Lovinobana, Slovakia
    Thanked 18 Times in 18 Posts
    Bench_Entropy.7z - MALWARE DETECTED!

    I´d like to report malware detection - "Win32/Upatre" in file Bench_Entropy.7z.
    I´ve using AVAST FREE Antivirus (1 year outdated version) with the latest virus definition database.

    I´m 100% sure it´s the totally false positive and the file isn´t harmful. The same happened when I´ve tested PerfectCompress archive with many PAQ algorithms - every PAQ executable was "infected".

    Better to investigate source code rather than solving malware
    problems, although the file isn´t harmful, I suppose.

    As for the prevention, It´ll be good if the archive would be password protected. Therefore users could unpack archive and verify it yourself rather than blocking download of the archive from due to "viruses".


Similar Threads

  1. Replies: 53
    Last Post: 18th November 2019, 06:50
  2. Replies: 109
    Last Post: 29th August 2016, 21:40
  3. Prime Number Benchmark
    By Matt Mahoney in forum Data Compression
    Replies: 17
    Last Post: 13th March 2016, 18:48
  4. Sequence of bits
    By Kaw in forum Data Compression
    Replies: 12
    Last Post: 25th September 2009, 09:53
  5. Test & Benchmark: Incomparable
    By Gish in forum Forum Archive
    Replies: 17
    Last Post: 8th November 2007, 01:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts