Page 4 of 7 FirstFirst ... 23456 ... LastLast
Results 91 to 120 of 209

Thread: LZHAM

  1. #91
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    468
    Thanks
    203
    Thanked 81 Times in 61 Posts
    Please enlighten me, oh wise friend of mine...

    All I did was to download the .zip file as provided by github at https://github.com/gameclosure/LZHAM and marked by the programmer as "Candidate alpha8 - still underground exhaustive testing".
    I can't see any "debug version".
    It compresses as expected and throws no debug info at all. But if you read carefuly you'll see that the project itself is a compression library and the .exe are examples of the capabilities of the algorithm, including proper compression like any other CLI packer.


    In any case, it is a mistake made by someone trying to help. And be treated as stupid is not helpful at all.


    Edit:
    Indeed it is a mistake. I am still a newbie. Next time correct me and I will thank you.
    Here is the right one.
    Attached Files Attached Files
    Last edited by Gonzalo; 10th October 2014 at 05:36.

  2. #92
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    458
    Thanks
    143
    Thanked 158 Times in 106 Posts
    I'm not sure if it's any newer, but that looks to be a fork of the original LZHAM which can be found at http://code.google.com/p/lzham/

    Either was alpha8 is still the latest so likely they're the same (albeit confusing to have two copies): "alpha8 - Feb. 2, 2014 - On SVN only: Project now has proper Linux cmake files. Tested and fixed misc. compiler warnings with clang v3.4 and gcc v4.8, x86 and amd64, under Ubuntu 13.10. Added code to detect the # of processor cores on Linux, fixed a crash bug in lzhamtest when the source file was unreadable, lzhamtest now defaults to detecting the # of max helper threads to use."

  3. #93
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    Quote Originally Posted by Gonzalo View Post
    In any case, it is a mistake made by someone trying to help. And be treated as stupid is not helpful at all.

    Edit:
    Indeed it is a mistake. I am still a newbie. Next time correct me and I will thank you.
    Sorry for my harshness. Sometimes I'm too impulsive and my typing hands are faster than brain

  4. #94
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    468
    Thanks
    203
    Thanked 81 Times in 61 Posts
    Sorry for my harshness. Sometimes I'm too impulsive and my typing hands are faster than brain
    That's ok. Don't worry.

    I'm not sure if it's any newer, but that looks to be a fork of the original LZHAM which can be found at http://code.google.com/p/lzham/
    You are right. It is newer since there are at this time a few new commits made on February 2014. But they are not related to the compression engine, so I think we can still use the binaries provided.

  5. #95
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    I tested the x64 version of lzham alpha7_r1 with enwik8 and enwik9 on my 4GHz i4790K processor. Results are as follows:

    enwik8:
    Compression:
    lzhamtest_x64 c: 24,794,784 bytes in 22.9 seconds (135 seconds process time); 903 MB memory

    Decompression:
    lzhamtest_x64 d: 0.55 seconds - process time (0.60 seconds global time)

    enwik9:
    Compression:
    lzhamtest_x64 c: 205,091,362 bytes in 274 seconds (1536 seconds process time); 2392 MB memory

    Decompression:
    lzhamtest_x64 d: 4.9 seconds - process time (5.0 seconds global time)

    That's a better compression ratio with compression in less than half the time and decompression in just over half the time than the versions shown on LTCB. Very impressive!

  6. #96
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    I've finally released v1.0 on github:
    https://github.com/richgel999/lzham_codec

    I know it took me ~3 years to get a real release up. But I had a lot of things going on, like working on Portal 2, DoTA 2, then shipping all the Source engine games on Linux, so I had my hands really full.

    This version is not compatible with bitstreams generated with the alphas. I'm promising not to change the bitstream format for v1.x releases, except for critical bug fixes.

    I would like to thank everyone here: I read these forums as a lurker before working on LZHAM, and I studied every LZ related post I could get my hands on. Especially anything related to LZ optimal parsing, which still seems like a black art. LZHAM was my way of learning how to implement optimal parsing (and you can see this if you study the progress I made in the early alphas on Google Code).

    Notable changes from the prev. alphas:
    - Added full OSX and iOS support (tested on various iPhone 4, 5, and 6+ models). Working on Android support next, which I need for our products.
    I still need to merge over the XCode project, and enhance the cmake files to support platforms other than Linux.
    Now that I am using this on real products at work I'll be able to dedicate more time to the codec.
    - Reduced decompressor's memory consumption and init times by greatly slashing the total # of Huffman and arithmetic tables (from hundreds down to <10), which also increased the decompressor's throughput and speed stability on mostly uncompressible files.
    This allowed for a slight reduction in the decompressor's complexity, because it now doesn't need to track the previous 2 output bytes.
    - Further reduced the decompressor's up front initialization time cost, by precalculating several large encoding tables. It's still more expensive than I would like, I think due to the init memory allocs and Huffman table initializations.
    - Added tuning options to allow the user to control the Huffman table update frequency. It defaults to an update interval that is much less frequent vs. the alphas.
    - Ratio seems very slightly improved from the prev. alpha on the test files I've looked at (but this was not my primary intention). I've focused more on decompression perf., lowering the init times, and lowering the decompressor's memory footprint, and iOS/OSX support, not ratio. Ratio may be slightly lower on some files due to the v1.0 Huffman and modeling changes.
    - I've extensively profiled and documented up front when it's not worth using LZHAM vs. LZMA. On a Core i7 Windows x64, if the # of compressed bytes is < ~13,000, LZMA is typically faster to decode. On high end iOS devices, the compressed size threshold is around 1KB (and I'm not sure why there's such a large difference yet). I believe this has to do with LZHAM's more expensive init cost vs. LZMA, and maybe due to LZHAM's frequent Huffman table updating at the beginning of streams.
    Last edited by rgeldreich; 26th January 2015 at 04:52.

  7. The Following User Says Thank You to rgeldreich For This Useful Post:

    Intrinsic (28th January 2015)

  8. #97
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Some enwik8/9 statistics with v1.0 (Windows, x64 executable, Core i7 Gulftown 3.3 GHz).

    -- enwik9 (512MB dictionary):
    Normal parsing: 204,325,043
    Compression Time: 339.7 secs, Decompression Time: 6.62 secs (151,045,672 bytes/sec)

    "Extreme" parsing (up to 4 LZ decisions per graph node, lzhamtest -x option): 202,237,199
    Compression Time: 1096.62 secs, Decompression Time: 6.59 secs (151,744,359 bytes/sec)

    -- enwik8 (128MB dictionary):
    Normal parsing: 25,091,033
    Compression Time: 27.95 secs, Decompression Time: .73 secs (136,920,702 bytes/sec)

    Extreme parsing: 24,990,739
    Compression Time: 72.21 secs, Decompression Time: .72 secs (137,963,425 bytes/sec)

  9. #98
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I updated LTCB but I had to guess at memory usage. A Gulftown is 6 cores, so I guessed 1.5x memory. http://mattmahoney.net/dc/text.html#2024

  10. #99
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    how did you compile that using MinGW?

  11. #100
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    In the past I've compiled LZHAM with TDM-GCC x64 (using Codeblocks as an IDE), and it worked well. For the v1.0 release I've tested it with VS 2010 and 2013 so far.

  12. #101
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    FYI: news article in Japanese found: http://news.mynavi.jp/news/2015/01/27/076/

  13. #102
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    is there a downloadable binary from this version 1.0 for windows 32bit or windows 64 bit?

    has someone a working link ?

  14. #103
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I can't fill in the x 64 version. Someone can put online for download.

  15. #104
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Hey Rich, very impressive!

    I'm curious what your simplified method for sending literals/delta-literals is?

    Are you using context bits for literals at all? Are you doing the funny LZMA rep-lit exclusion thing?
    Last edited by cbloom; 3rd August 2016 at 20:34.

  16. #105
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Also, if you have the permission to release any of your game test files publicly, I think that would help the community a lot.

    As you correctly noted, people spend too much time on text and not enough on generic binary data. Part of the reason is there aren't good test sets for the type of binary data that we see.

    On this type of binary data, LZMA usually beats PAQ (and NanoZip beats LZMA)

    I've got some private collections of test data but haven't got the permission from clients to release it publicly.

    I posted one here : (lzt24)

    https://drive.google.com/file/d/0B-y...lhS0hrdVE/edit

    but we need a lot more, and bigger.
    Last edited by cbloom; 3rd August 2016 at 20:34.

  17. #106
    Member
    Join Date
    May 2009
    Location
    France
    Posts
    95
    Thanks
    13
    Thanked 72 Times in 42 Posts
    Hello,

    lzhamtest_win32.7z
    lzhamtest_win64.7z
    Should work...

    Edit: win64 correction and win32 added.

    AiZ
    Last edited by AiZ; 28th January 2015 at 23:41.

  18. The Following 3 Users Say Thank You to AiZ For This Useful Post:

    Bulat Ziganshin (29th January 2015),Nania Francesco (29th January 2015),Stephan Busch (29th January 2015)

  19. #107
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    dear AiZ, the lzhamtest_win64.7z contains the 32-bit version.. We cannot choose larger dictionary than -d26

  20. #108
    Member
    Join Date
    May 2009
    Location
    France
    Posts
    95
    Thanks
    13
    Thanked 72 Times in 42 Posts
    Hi Stephan,

    I've downloaded lzhamtest_win64.7z from here and it's Ok, please check your downloads.

    AiZ

  21. The Following User Says Thank You to AiZ For This Useful Post:

    Stephan Busch (29th January 2015)

  22. #109
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Quote Originally Posted by cbloom View Post
    Hey Rich, very impressive! I'm curious what your simplified method for sending literals/delta-literals is? Are you using context bits for literals at all? Are you doing the funny LZMA rep-lit exclusion thing?
    Thanks, I'm not doing anything fancy with literals/delta literals at all now. For literals/delta literals v1.0 just uses two plain Huffman tables with no context, because the cost (in memory, init time, and decompression throughput predictability) was more than I was comfortable with. (I bit off more than I could chew with all those tables.) So it now only uses 8 total Huffman tables, vs. the previous ~134 (!):

    quasi_adaptive_huffman_data_model m_lit_table; // was [64] in the alphas, 3 MSB's each from the prev. 2 chars
    quasi_adaptive_huffman_data_model m_delta_lit_table; // was [64] in the alphas, 3 MSB's each from the prev. 2 chars
    quasi_adaptive_huffman_data_model m_main_table;
    quasi_adaptive_huffman_data_model m_rep_len_table[2]; // index: cur_state >= CLZDecompBase::cNumLitStates
    quasi_adaptive_huffman_data_model m_large_len_table[2]; // index: cur_state >= CLZDecompBase::cNumLitStates
    quasi_adaptive_huffman_data_model m_dist_lsb_table;

    I also reduced the total # arithmetic tables. m_is_match_model's context no longer includes any prev. character context bits. It's now just the current LZMA state index:

    adaptive_bit_model m_is_match_model[CLZDecompBase::cNumStates];
    adaptive_bit_model m_is_rep_model[CLZDecompBase::cNumStates];
    adaptive_bit_model m_is_rep0_model[CLZDecompBase::cNumStates];
    adaptive_bit_model m_is_rep0_single_byte_model[CLZDecompBase::cNumStates];
    adaptive_bit_model m_is_rep1_model[CLZDecompBase::cNumStates];
    adaptive_bit_model m_is_rep2_model[CLZDecompBase::cNumStates];

    I did implement the approach of letting the user configure the total # of literals/delta_literal context bits, and to also allow the user to configure the bitmasks+shift offsets to use on the prev. X characters to compute the context. So the user could choose no context bits, like v1.0, or 3+3 bits like the alphas, or some combination of 1-8 bits from the prev. char, or a mix of the prev. 2 chars, etc. (The idea was this would allow the user to find the optimal settings to use for their data, just like LZMA lets you do with its lc/lp/pb settings which I've found to be very useful.) All this was starting to get too complex, so the KISS principle won out.

    Not sure if I understand what you mean by LZMA rep-lit exclusion (I'll reread your notes on LZMA again).

  23. #110
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Quote Originally Posted by cbloom View Post
    As you correctly noted, people spend too much time on text and not enough on generic binary data. Part of the reason is there aren't good test sets for the type of binary data that we see.
    On this type of binary data, LZMA usually beats PAQ (and NanoZip beats LZMA)
    I've got some private collections of test data but haven't got the permission from clients to release it publicly..
    I've encountered the same thing. I recently tried several PAQ based compressors on our Unity game data and LZMA was better. My current title is a ~166 MB mix of PVRTC or ETC textures, meshes, animations, MP3 or OGG music/sound effects, and tons of misc. binary serialized object data. The best open source codec I've found for our data is LZMA (counting only ratio).

    Rights are a tricky subject - I'll poke around and see what we could publically release.

  24. #111
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    When trying LZHAM_x64 on the App testset, the following error occurs:

    D:\TESTSETS>lzhamtest -m4 -e -t4 -d29 c D:\TESTSETS\TEST_App\ app.lzham

    Error: Too many filenames!

    I also tried to put the input directory in quotes. What am I doing wrong here?

  25. #112
    Member ivan2k2's Avatar
    Join Date
    Nov 2012
    Location
    Russia
    Posts
    35
    Thanks
    13
    Thanked 6 Times in 3 Posts
    Quote Originally Posted by Stephan Busch View Post
    When trying LZHAM_x64 on the App testset, the following error occurs:

    D:\TESTSETS>lzhamtest -m4 -e -t4 -d29 c D:\TESTSETS\TEST_App\ app.lzham

    Error: Too many filenames!

    I also tried to put the input directory in quotes. What am I doing wrong here?
    try to use "a" mode
    Last edited by ivan2k2; 29th January 2015 at 23:44.

  26. #113
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Quote Originally Posted by rgeldreich View Post
    Not sure if I understand what you mean by LZMA rep-lit exclusion (I'll reread your notes on LZMA again).
    Well, is your "delta_lit" table just encoding the xor of the predicted symbol with the actual symbol?


    Also, do you do independent updates of the 8 Huffman tables? I see you have those parameters for the huffman update interval, but can encoder detect that a table hasn't changed much and only update a partial subset of the tables? Have you done any work on optimizing where the update locations are?
    Last edited by cbloom; 3rd August 2016 at 20:34.

  27. #114
    Member
    Join Date
    May 2013
    Location
    ARGENTINA
    Posts
    54
    Thanks
    62
    Thanked 13 Times in 10 Posts
    Quote Originally Posted by Stephan Busch View Post
    When trying LZHAM_x64 on the App testset, the following error occurs:

    D:\TESTSETS>lzhamtest -m4 -e -t4 -d29 c D:\TESTSETS\TEST_App\ app.lzham

    Error: Too many filenames!

    I also tried to put the input directory in quotes. What am I doing wrong here?


    Me too i have the same Error: Too many filenames!
    but using the command a
    a for folders
    c for files
    lzham -m4 -e -t8 -d29 a D:\TEST_64\* test_64.lzham


    if i don't set the output file generate "__comp_temp_2920560304__.tmp"

    So i try adding to freearc in arc.ini and works to compress folders.
    [External compressor:lzham]
    packcmd = lzham -m4 -d29 -t8 c $$arcdatafile$$.tmp $$arcpackedfile$$.tmp
    unpackcmd = lzham d $$arcpackedfile$$.tmp $$arcdatafile$$.tmp
    but i dont know ho to use the delta option in lzham. [-afilename (what file i need?)]

  28. #115
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    this gives the same error

  29. #116
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Quote Originally Posted by cbloom View Post
    Well, is your "delta_lit" table just encoding the xor of the predicted symbol with the actual symbol?
    Also, do you do independent updates of the 8 Huffman tables? I see you have those parameters for the huffman update interval, but can encoder detect that a table hasn't changed much and only update a partial subset of the tables? Have you done any work on optimizing where the update locations are?
    Yes, for delta_lit's (literals immediately following a match) it encodes the xor of the predicted byte (what I call the "mismatch byte") with the actual byte (let's call this the delta byte).

    The match finder tries to be smart about deciding which matches of each length to return to the parser. There are typically a large # of possible matches it could return of a given length, so this gives the finder some freedom to be picky: When it encounters matches of equal length, it'll choose the one with the lowest match bucket. If the 2 matches fall into the same bucket, it then favors the match which has the lowest # of set bits in the delta byte. (The actual logic is a little more complex, but that's the gist of it. These rules only apply to matches >= 3 bytes. len2 matches are treated specially and I think the finder isn't as picky about them right now.)

    Also, when choosing between two matches of equal length and match slot, the finder favors the match with the lowest value in the least significant 4 bits of the distance, because the 4 distance LSB's are separately coded into another Huffman table. I remember finding this to be a small win on some binary files, and it was cheap to add.

    Yes, the Huff tables are all independently updated, but the update schedules are the same for all tables. The user can tweak the max # of symbols between updates, and the rate at which the update interval grows over time. The huff tables only use 16-bit sym frequency counts so that ultimately limits how long a table can go between updates. The tables are always entirely updated (big hammer approach - nothing fancy).

  30. The Following User Says Thank You to rgeldreich For This Useful Post:

    cbloom (30th January 2015)

  31. #117
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Quote Originally Posted by GOZARCK View Post
    Me too i have the same Error: Too many filenames!
    but using the command a
    a for folders
    c for files
    lzham -m4 -e -t8 -d29 a D:\TEST_64\* test_64.lzham

    if i don't set the output file generate "__comp_temp_2920560304__.tmp"

    So i try adding to freearc in arc.ini and works to compress folders.
    [External compressor:lzham]
    packcmd = lzham -m4 -d29 -t8 c $$arcdatafile$$.tmp $$arcpackedfile$$.tmp
    unpackcmd = lzham d $$arcpackedfile$$.tmp $$arcdatafile$$.tmp
    but i dont know ho to use the delta option in lzham. [-afilename (what file i need?)]
    Sorry about that, lzhamtest is really just a simple low-level testbed. I've integrated LZHAM into 7zip's 7za command line tool and GUI for higher level testing, so I should probably just release that. Or I could integrate LZHAM into somebody else's open source compression tool - but which one?

    Anyhow, the "a" option just compresses a bunch of files from a directory (to *temporary* compressed files), and the "c" option just compresses a single input file to an output compressed file. The "d' option decompresses one file to another.

  32. The Following User Says Thank You to rgeldreich For This Useful Post:

    GOZARCK (30th January 2015)

  33. #118
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    462
    Thanked 175 Times in 85 Posts
    I think a 7-Zip plugin would be great

  34. The Following 2 Users Say Thank You to Stephan Busch For This Useful Post:

    Bloax (30th January 2015),GOZARCK (30th January 2015)

  35. #119
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Quote Originally Posted by rgeldreich View Post
    If the 2 matches fall into the same bucket, it then favors the match which has the lowest # of set bits in the delta byte. (The actual logic is a little more complex, but that's the gist of it. These rules only apply to matches >= 3 bytes. len2 matches are treated specially and I think the finder isn't as picky about them right now.)

    Also, when choosing between two matches of equal length and match slot, the finder favors the match with the lowest value in the least significant 4 bits of the distance, because the 4 distance LSB's are separately coded into another Huffman table. I remember finding this to be a small win on some binary files, and it was cheap to add.
    Ah, yeah. Good ideas I hadn't thought of.

    There's a huge amount of offset redundancy sometimes, so the encoder can use its freedom to choose which offset.

    In theory you should exclude *every* literal that comes after that same match substring, not just the one that comes after your particular offset. That's too slow so instead you can exclude the *best* literal.

    In fact the encoder can see the actual literal that occurs after the match, and could choose the offset such that the literal-after-match xor with the actual literal has the fewest bits set, or is coded in the lowest cost.
    Last edited by cbloom; 3rd August 2016 at 20:33.

  36. #120
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @rgeldreich

    * "I've integrated LZHAM into 7zip's 7za command line tool" * ???

    can you please this explain ?

    A 7za.exe , which can produce a *.7z - file and inside using the lzham - compression algorithm ?

    best regards

Page 4 of 7 FirstFirst ... 23456 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •