Page 3 of 3 FirstFirst 123
Results 61 to 86 of 86

Thread: RH4 - Solid multifile compressor

  1. #61
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Strange that it crashes. Could you please try with -r0 before c? It means that the files will be sorted by path, so if it crashes at the end, there is an error caused by one of the files near the end. Otherwise, I do not know.

    Thanks for the idea. I implement SHA256 (and potentially BLAKE2) further down the list.

    Right now, I am considering what to do about matches for binary/executable files. My minimum match length is 4 bytes right now, which misses many matches in binary files compared to LZ77. I tried dynamic minimums (3 for short distance, 4 for medium, 5 for long), but this was slightly worse in ratio (few o1 len3 matches, worsened lengths for len4 matches). I have also tried recent pairs (log-exp coding to repeat some non-context len2 match in the last 256/1024 bytes), but this was also not helpful.

    I use log-exp coding for indicating match vs literal: 28 log bins for match lengths from 0 to 2k, so the alphabet is 256 + 28. Any remainder bits are encoded after that for longer lengths. I don't think adding more possibilities to the same alphabet will help anything.

  2. #62
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Tested RH4 c2 with -r0, -r1, -r2. Testing -r2 c6 now. http://mattmahoney.net/dc/10gb.html (system 4).
    Code:
     3741983103     1702      515   4  RH4_x64.exe v8   -r2 c2
     3751765805     1421      538   4  RH4_x64.exe v8   -r0 c2
     3780469075     1635      562   4  RH4_x64.exe v8   -r1 c2
     3785182357     1373      269   4  RH4_x64.exe v6   c2
     3788426871     1843      530   4  RH4_x64.exe v7   c2
     3832734358     2941      255   4  RH4_x64.exe v5   c2

  3. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    cade (1st May 2014)

  4. #63
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    I should say that c6 is not worth it (compared to c5) for large files.

    For c1 to c6,
    Search depths are: 4, 8, 24, 64, 128, max
    Lazy search depths are: 0, 2, 4, 4, 4, max
    Tables are 4096 entries (max).

    Edit: To clarify, for search and lazy search, max means follow the whole hash chain, not testing all 4096 entries.
    Last edited by cade; 1st May 2014 at 05:21.

  5. #64
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    It also crashes with -r0 switch.

  6. #65
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Could you please give me a link to download the Sources test set?

  7. #66
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts

  8. #67
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    I have several errors (11596 files failed checksums) extracting that 7z archive:

    Code:
    Checksum error in \squeezechart_src\GIMP v2 3 1\ChangeLog. The file is corrupt
    Checksum error in \squeezechart_src\ZLIB v1 2 3\ChangeLog. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\intl\ChangeLog. The file is corrupt
    Checksum error in \squeezechart_src\OPTIPNG 0 5 4\lib\zlib\ChangeLog. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\tools\install\debian_fpc-src\changelog. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\debian\changelog. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\tools\install\debian_fpc\changelog. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\tools\install\cross_unix\debian_crosswin32\changelog. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\ChangeLog. The file is corrupt
    Checksum error in \squeezechart_src\ZLIB v1 2 3\contrib\minizip\ChangeLogUnzip. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\components\synunihighlighter\CHANGES. The file is corrupt
    Checksum error in \squeezechart_src\Bonk Encoder v103\bin\lang\Changes. The file is corrupt
    Checksum error in \squeezechart_src\OPTIPNG 0 5 4\lib\libpng\CHANGES. The file is corrupt
    Checksum error in \squeezechart_src\CHM Decoder\chmdeco-popups. The file is corrupt
    Checksum error in \squeezechart_src\GIMP v2 3 1\plug-ins\gflare\gflares\Classic. The file is corrupt
    Checksum error in \squeezechart_src\CHM Decoder\debian\compat. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\debian\compat. The file is corrupt
    Checksum error in \squeezechart_src\GIMP v2 3 1\compile. The file is corrupt
    Checksum error in \squeezechart_src\CHM Decoder\configure. The file is corrupt
    Checksum error in \squeezechart_src\GIMP v2 3 1\configure. The file is corrupt
    Checksum error in \squeezechart_src\OPTIPNG 0 5 4\lib\zlib\configure. The file is corrupt
    ...
    Some other file types too:
    Code:
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\landscapes\apollo17.png. The file is corrupt
    Checksum error in \squeezechart_src\Lazarus v0 9 20 (Compiler)\images\22x22\appointment-new.png. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\constellation-art\apus.png. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\constellation-art\aquarius.png. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\constellation-art\aquila.png. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\constellation-art\ara.png. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\constellation-art\argonavis.png. The file is corrupt
    Checksum error in \squeezechart_src\Stellarium v0 8 2\textures\ariel.png. The file is corrupt
    ...

  9. #68
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    I can confirm that issue (using p7zip Version 9.20).
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  10. #69
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    Thanks for reporting.. Here is another try:

    http://www.squeezechart.com/TEST_SOURCES.7z

  11. #70
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    That one validates successfully, thanks.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  12. #71
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    There are non-ASCII characters in the file names, such as in /Wise Installer Unpacker:

    Code:
    FILE_ID.DE°
    FILE_ID.DEø
    FILE_ID.EN°
    FILE_ID.ENø
    I don't have a fix for it in this design, but probably in the next one. The reason is in a char, numbers above 127 (non-ASCII) are negative.

  13. #72
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Another update, faster and smaller model, some improvements, better archive structure. Note that now you can not specify multiple inputs on the command line (current limitation), it is either one file or one folder. I do not support unicode filenames right now, perhaps later (problem in previous post). You can specify window size, table size (smaller if necessary) and hash bits to scale down for less memory.

    Benchmark settings:
    Encode:
    RH5_x64.exe -skip-crcs -force-overwrite -window:27 -hash:17 c6 out enwik8.txt

    Normal encode:
    RH5_x64 c out enwik8.txt

    Decode:
    RH5_x64 d out enwik8_reconstructed.txt

    List contents:
    RH5_x64 l out

    Test archive (decodes properly + crcs present)
    RH5_x64 t out

    Results (normal HDD, i7-2600):
    enwik8, fastest (1): 33353757 in 1.51 sec
    enwik8, normal (2): 31798141 in 2.17 sec
    enwik8, max (6): 29077273 in 17.14 sec
    enwik8, unpack: 0.61 sec

    enwik9, fastest: 292943341 in 14.3 sec
    enwik9, normal: 278822435 in 19.2 sec
    enwik9, max: 254219411 in 165 sec
    enwik9, unpack: 5.56 sec
    Attached Files Attached Files

  14. The Following 3 Users Say Thank You to cade For This Useful Post:

    Nania Francesco (13th November 2014),Stephan Busch (14th November 2014),surfersat (12th November 2014)

  15. #73
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I updated my benchmarks. Better compression on LTCB. http://mattmahoney.net/dc/text.html#2542

    10GB benchmark now restores empty directories but 4 of them (all named .in-transit in different places) did not restore. I tested RH5_x64.exe under Wine 1.6 in Ubuntu. Also a minor bug, when the program prompts to overwrite a file, then after it exits, commands entered in the terminal window no longer echo to the screen.
    http://mattmahoney.net/dc/10gb.html (system 4).

  16. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    cade (14th November 2014)

  17. #74
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Thanks for testing. I tried to create a folder named ".in-transit" myself in Windows and got this error, I guess that the same failure happened in the decompressor:
    Code:
    [Window Title]
    Rename
    
    [Content]
    You must type a file name.
    
    [OK]
    My console in Windows still works fine after the case you described. I use _getch from conio now instead of what I did before (get stdin, then flush it to get rid of Enter). This might be strange behaviour for Wine, thanks for letting me know.

    Edit: The idea to skip checksum for benchmarking is to not count the (small) time needed for crc32 in packing and unpacking.
    Last edited by cade; 14th November 2014 at 01:28.

  18. #75
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    Dear Nauful,

    I am getting 'assert failed' message when trying to compress Audio Testset of the SqueezeChart:
    Window size: 131072 KB
    Context table size: 12 bits
    Hash size: 17 bits
    crc32 [1/13] readme.txt: 38B928A6
    crc32 [2/13] [acapella] Shun Ward ò Stripper Luv.wav: BB4C852
    crc32 [3/13] [blues] Caalamus ò Blues Alley.wav: 16E64F84
    crc32 [4/13] [downtempo] Smooth Genestar ò The Source.wav: A1C8E43A
    crc32 [5/13] [folk] Paper Navy ò I Can't Read The Stars.wav: 231A0C8C
    crc32 [6/13] [hiphop] Jay Slim ò The High Life.wav: CA7B91B0
    crc32 [7/13] [house] Loxy ò Tiny Tune (Live).wav: DAEA73D4
    crc32 [8/13] [jazz] Steven O'Brien ò Slow Jazz Piece.wav: 3E75BD39
    crc32 [9/13] [orchestral] PHerbert ò It's All About The Journey.wav: 6F90458E
    crc32 [10/13] [pop] Crystal Newton ò Declaration Of Love.wav: 9EA359C4
    crc32 [11/13] [reggae] Messian Dread ò Roots And Culture.wav: 9F1A4CC4
    crc32 [12/13] [rock] ─ndy ò Have you seen the lizards.wav: 34FC1AAB
    crc32 [13/13] [trance] New Age Hippies ò Metamorphosia.wav: ABA7683E
    Duplicates: 0/13
    Assert failed
    c:\programming\c++\rh5\source\Entropy_Huffman.h:21 3

    Execution time: 3.935 s

  19. #76
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I saw that -skip-checksum speeds up compression and decompression a bit. But it also skips the file deduplication step, so it makes compression much worse on the 10GB benchmark. I guess that you compare checksums, and if they match you compare the files to see if one can be skipped.

  20. #77
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Quote Originally Posted by Stephan Busch View Post
    Dear Nauful,

    I am getting 'assert failed' message when trying to compress Audio Testset of the SqueezeChart:
    Sorry, I do not have support for unicode. I think I will add UTF-8 at some point. The assert fails because it tried to write a symbol ò out of the normal ASCII range.

    Quote Originally Posted by Matt Mahoney View Post
    I saw that -skip-checksum speeds up compression and decompression a bit. But it also skips the file deduplication step, so it makes compression much worse on the 10GB benchmark. I guess that you compare checksums, and if they match you compare the files to see if one can be skipped.
    Exactly correct.

  21. #78
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    I have added unicode support. Old archives are still compatible because this is stored as utf-8. For some reason that I could not figure out, my Windows console shows unicode printed strings characters out of the ASCII range as ???, but file names were stored and written correctly.

    Quote Originally Posted by Stephan Busch View Post
    Dear Nauful,

    I am getting 'assert failed' message when trying to compress Audio Testset of the SqueezeChart:
    This has been fixed (file names), please let me know if this works for you.
    Attached Files Attached Files

  22. The Following User Says Thank You to cade For This Useful Post:

    surfersat (20th November 2014)

  23. #79
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    Unable to run x64 version due to missing msvcp110.dll.
    I guess you didnt think anyone would run this on a file larger than 4GB since counter keeps resetting, my testfile is 39.699.439.698 bytes.

    E:\MRIMG>d:\Downloads\RH5x32 -window:64 -hash:32 -table:32 c1 TEST Toshiba_Satellite_Pro_C50D-A_PSCGXE_Win8.mrimg

    Window size: 131072 KB
    Context table size: 12 bits
    Hash size: 17 bits
    crc32 [1/1] Toshiba_Satellite_Pro_C50D-A_PSCGXE_Win8.mrimg: 95256534
    Duplicates: 0/1
    Encoder memory: 203344 KB
    Done compressing
    1044734034 -> 4294967295 (-0.00 MB, -0.000) in 1914.313 sec (19.78 MB/s)

    E:\MRIMG>d:\Downloads\RH5x32 d TEST Toshiba_Satellite_Pro_C50D-A_PSCGXE_Win8.mrimg.new

    Decoder 27:12:17 memory: 135760 KB
    Done decompressing
    4294967295 -> 1044734034 in 536.221 sec (70.61 MB/s)
    All crc32 match

    CRC32 and MD5 matched.

  24. #80
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    The 32-bit version does not work on files larger than 4 GB, the 64-bit version specifically states this when you start it. I use the 64-bit version on large files (>40 GB).

    You need to install these runtimes:
    http://www.microsoft.com/en-us/downl....aspx?id=30679

    I could change the MSVC runtime to static linking so you will not need any additional dll in the next version.

  25. The Following User Says Thank You to cade For This Useful Post:

    surfersat (20th November 2014)

  26. #81
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by cade View Post
    The 32-bit version does not work on files larger than 4 GB, the 64-bit version specifically states this when you start it. I use the 64-bit version on large files (>40 GB).

    You need to install these runtimes:
    http://www.microsoft.com/en-us/downl....aspx?id=30679

    I could change the MSVC runtime to static linking so you will not need any additional dll in the next version.
    But x32 did work! Filesize of compressed was ~20GB and as I wrote MD5 and CRC32 matched after extraction.

    And since I could not start x64 i could not read any messages

    Yes please, include static linking :-D

  27. The Following User Says Thank You to a902cd23 For This Useful Post:

    avitar (20th November 2014)

  28. #82
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Can you please clarify #80 after what was said in #81.

    +1 re static linking.
    j

  29. #83
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    this version still crashes on the Audio testset:

    RH5
    64-bit version, supports archives >4 GB
    Written by Nauful
    Free for non-commercial use
    Nov 19, 2014

    Window size: 131072 KB
    Context table size: 12 bits
    Hash size: 17 bits

    Error: Output file au.rh5 already exists. Press Y to overwrite it, A to overwrit
    e all, or any other key to exit: y
    crc32 [1/13] readme.txt: 38B928A6
    crc32 [2/13] [acapella] Shun Ward ? Stripper Luv.wav: BB4C852
    crc32 [3/13] [blues] Caalamus ? Blues Alley.wav: 16E64F84
    crc32 [4/13] [downtempo] Smooth Genestar ? The Source.wav: A1C8E43A
    crc32 [5/13] [folk] Paper Navy ? I Can't Read The Stars.wav: 231A0C8C
    crc32 [6/13] [hiphop] Jay Slim ? The High Life.wav: CA7B91B0
    crc32 [7/13] [house] Loxy ? Tiny Tune (Live).wav: DAEA73D4
    crc32 [8/13] [jazz] Steven O'Brien ? Slow Jazz Piece.wav: 3E75BD39
    crc32 [9/13] [orchestral] PHerbert ? It's All About The Journey.wav: 8.0/21.9 MB
    crc32 [9/13] [orchestral] PHerbert ? It's All About The Journey.wav: 16.0/21.9 M
    crc32 [9/13] [orchestral] PHerbert ? It's All About The Journey.wav: 21.9/21.9 M
    crc32 [9/13] [orchestral] PHerbert ? It's All About The Journey.wav: 6F90458E

    RH5_x64 stopped working. details:

    Problemereignisname: BEX64
    Anwendungsname: RH5_x64.exe
    Anwendungsversion: 0.0.0.0
    Anwendungszeitstempel: 546d3566
    Fehlermodulname: RH5_x64.exe
    Fehlermodulversion: 0.0.0.0
    Fehlermodulzeitstempel: 546d3566
    Ausnahmeoffset: 0000000000009034
    Ausnahmecode: c0000409
    Ausnahmedaten: 0000000000000002
    Betriebsystemversion: 6.3.9600.2.0.0.256.48
    Gebietsschema-ID: 1031
    Zusatzinformation 1: f7d6
    Zusatzinformation 2: f7d68098d483bed5db8fb8e9fe4bf559
    Zusatzinformation 3: 2061
    Zusatzinformation 4: 20618648d482938e557007eecc7e85f9

  30. #84
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Quote Originally Posted by a902cd23 View Post
    But x32 did work! Filesize of compressed was ~20GB and as I wrote MD5 and CRC32 matched after extraction.

    And since I could not start x64 i could not read any messages

    Yes please, include static linking :-D
    To clarify: the actual compressor ignores total sizes, it encodes as unsigned 64-bit integer and only reads windows at a time, so it does not matter x86 vs x64 (edit: there is a performance difference between x86 and x64 because some things such as string matching and crc32 slice-by-8/16 are different, x64 is faster). The front-end for x86 will not handle any such large numbers properly. There are ways to do it, but it is more trouble and so I will leave that for later. I have not tested the x86 version against large files, archives of large files or large archives of files (small/large) where large means >= 4GB, so I did not want to make any guarantees that everything is (probably) ok.

    Quote Originally Posted by Stephan Busch View Post
    this version still crashes on the Audio testset:
    Could you please upload the audio testset where I can download it? Seems to be a problem with filenames. It worked fine when I tested some incompressible files with unicode names, so I need the actual files to figure out this problem.

  31. #85
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    you can download the testset here: http://www.squeezechart.com/TEST_Audio.arc

  32. #86
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Sorry, had a silly mistake in printing some unicode filenames. I also statically linked MSVC runtimes now.
    Attached Files Attached Files

  33. The Following User Says Thank You to cade For This Useful Post:

    surfersat (21st November 2014)

Page 3 of 3 FirstFirst 123

Similar Threads

  1. Bittorrent and solid archives
    By lunaris in forum Data Compression
    Replies: 8
    Last Post: 29th December 2010, 10:54

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •