Page 8 of 8 FirstFirst ... 678
Results 211 to 239 of 239

Thread: Ultra-fast LZ

  1. #211
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    The brand new ULZ v0.06 is here!

    It's byte-aligned LZ77 with a 16MB window (=block size). Now it looks like an LZ4 with a large window. For literal run length coding I make use of EncodeMod as described by Charles Bloom. I'm not 100% sure about the Optimal Parsing here - currently ULZ uses some ad hoc solution for hard-to-compress files - and looks like it does work!

    Anyway, enjoy new release!

    Some testing results (Intel Core i7-4790K @ 4.6GHz, 32GB @ 1866MHz DDR3 RAM, RAMDisk)
    Code:
    Z:\>ulz c9 enwik9
    Compressing enwik9:
    1000000000 -> 291028084 in 325.732 sec
    
    Z:\>ulz d enwik9.ulz e9
    Decompressing enwik9.ulz:
    291028084 -> 1000000000 in 1.110 sec
    
    Z:\>ulz c enwik9
    Compressing enwik9:
    1000000000 -> 365851618 in 30.323 sec
    
    Z:\>ulz d enwik9.ulz e9
    Decompressing enwik9.ulz:
    365851618 -> 1000000000 in 1.072 sec
    
    Z:\>ulz c1 enwik9
    Compressing enwik9:
    1000000000 -> 421011442 in 7.379 sec
    
    Z:\>ulz d enwik9.ulz e9
    Decompressing enwik9.ulz:
    421011442 -> 1000000000 in 0.950 sec
    Attached Files Attached Files

  2. The Following 7 Users Say Thank You to encode For This Useful Post:

    avitar (15th July 2017),Cyan (14th July 2017),hunman (21st July 2017),Matt Mahoney (14th July 2017),Mike (13th July 2017),RamiroCruzo (18th July 2017),xezz (15th July 2017)

  3. #212
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Congrats on the world's fastest decompressor. http://mattmahoney.net/dc/text.html

  4. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    encode (14th July 2017),Mike (14th July 2017)

  5. #213
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Is this compressor open source, or binary only ?

  6. #214
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Quote Originally Posted by boxerab View Post
    Is this compressor open source, or binary only ?
    At this moment, it is binary only!

  7. The Following User Says Thank You to encode For This Useful Post:

    boxerab (1st September 2017)

  8. #215
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Well, I'm about to release the ULZ as an open source (public domain) data compression library. With minimalistic and simplest interface. The library consists of ULZ.HPP - the library itself and ULZ.CPP - simple demo compressor.
    The usage is as simple as that:
    Code:
    int comp_len=ulz->Compress(in, in_len, out, level);
    I'm undecided about the compressor configuration. Currently I have:
    16-bit window with rep offsets (a la LZ4 competitor)
    17-bit window
    18-bit window
    18-bit window with 4 rep offsets
    19-bit window
    21-bit window with rep offsets (actual window size is limited to 2 GB)
    22-bit window with rep offsets (same unlimited window as with 21-bit version)
    24-bit window (ULZ v0.06 modification)

    At this time I'm planning to keep the Greedy (unoptimized) parsing only, for simplicity.


  9. The Following 2 Users Say Thank You to encode For This Useful Post:

    Hakan Abbas (23rd March 2019),moisesmcardona (23rd March 2019)

  10. #216
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    32
    Thanked 22 Times in 15 Posts
    Decompressor should include out_len for buffer over run.

  11. #217
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    All compressed/raw size storage is up to the user. Simply write raw_len within your stream.
    and do
    Code:
    int out_len=ulz->Decompress(in, comp_len, out);

  12. #218
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    32
    Thanked 22 Times in 15 Posts
    out_len should be included for malicious files that during decompression produce larger file than specified in the format and thus cause buffer overrun.
    should be
    Code:
    size_t Decompress(in, comp_len, out, out_len) or
    int Decompress(in, comp_len, out, out_len)
    Also see the LZ4 which has similar API.
    Last edited by algorithm; 23rd March 2019 at 14:07.

  13. The Following 2 Users Say Thank You to algorithm For This Useful Post:

    Bulat Ziganshin (23rd March 2019),encode (23rd March 2019)

  14. #219
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Well, or at least:
    Code:
    ulz->Decompress(in, comp_len, out, BLOCK_SIZE);
    
    or
    
    ulz->Decompress<BLOCK_SIZE>(in, comp_len, out);

  15. #220
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Added buffer overrun protection!

  16. #221
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts

  17. The Following 9 Users Say Thank You to encode For This Useful Post:

    algorithm (24th March 2019),Bulat Ziganshin (24th March 2019),Cyan (25th March 2019),introspec (29th March 2019),jibz (24th March 2019),Mike (24th March 2019),moisesmcardona (24th March 2019),spark (26th March 2019),svpv (25th March 2019)

  18. #222
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    32
    Thanked 22 Times in 15 Posts
    So ulz is similar to lz4 with 17 bits window ,3 bit run length 4 bits match length and larger run and match lengths encoded as variable length integers of length multiple of 7 bits.

    I think for g++ you must use ftello with _FILE_OFFSET_BITS defined as 64.

  19. The Following User Says Thank You to algorithm For This Useful Post:

    encode (24th March 2019)

  20. #223
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Thanks! Added g++ support.
    And yep, you are correct about the ULZ algorithm description.

  21. #224
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    181
    Thanks
    67
    Thanked 15 Times in 11 Posts
    A recommendation, for binary files it's best to use the Release section

  22. The Following User Says Thank You to moisesmcardona For This Useful Post:

    encode (24th March 2019)

  23. #225
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,219
    Thanks
    188
    Thanked 962 Times in 496 Posts
    7820X @ 4.5Ghz, ramdisk
    Code:
    5.188s  1.000s:  ulz_ic19_SSE2-O1-ipo.exe c1 enwik9 x & ulz_ic19_SSE2-O1-ipo.exe d x y        
    5.219s  1.016s:  ulz_ic19_SSE2-O3-ipo-PGO.exe c1 enwik9 x & ulz_ic19_SSE2-O3-ipo-PGO.exe d x y    
    5.313s  1.015s:  ulz_VS2017.exe c1 enwik9 x & ulz_VS2017.exe d x y                  
    5.328s  1.015s:  ulz_clang9_k8.exe c1 enwik9 x & ulz_clang9_k8.exe d x y               
    5.375s  1.015s:  ulz_ic19_SSE2-O3-ipo.exe c1 enwik9 x & ulz_ic19_SSE2-O3-ipo.exe d x y        
    5.391s  1.015s:  ulz_ic19_SSE2-O3.exe c1 enwik9 x & ulz_ic19_SSE2-O3.exe d x y            
    5.391s  1.032s:  ulz_gcc82_k8.exe c1 enwik9 x & ulz_gcc82_k8.exe d x y                
    5.406s  1.015s:  ulz_ic19_SSE2-O2-ipo.exe c1 enwik9 x & ulz_ic19_SSE2-O2-ipo.exe d x y        
    5.453s  1.032s:  ulz_gcc82_k8-lto.exe c1 enwik9 x & ulz_gcc82_k8-lto.exe d x y
    Attached Files Attached Files

  24. The Following 4 Users Say Thank You to Shelwien For This Useful Post:

    encode (25th March 2019),Mike (25th March 2019),moisesmcardona (25th March 2019),spark (26th March 2019)

  25. #226

  26. The Following User Says Thank You to dnd For This Useful Post:

    encode (29th March 2019)

  27. #227
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts

    ULZ test

    Dear colleagues,

    I woud like to share my results for tar file compression (tarred files from my benchmark for FileOptimizer) on "average" pc.

    Best regards,

    FatBit
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	ULZ.png 
Views:	77 
Size:	20.6 KB 
ID:	6549  

  28. The Following User Says Thank You to FatBit For This Useful Post:

    encode (3rd April 2019)

  29. #228
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Some testing results of the ULZ with Optimal Parsing + modifications:

    Control byteWindow sizeENWIK9QuakeChampions.exeblood_covenant.pakPariahInterface.utx
    LLLLMMMM64 KB371,956,90119,318,835304,251,3639,073,713[/I]
    LLLOMMMM128 KB348,317,64718,907,428302,908,4409,016,237
    LLOOMMMM256 KB329,221,88318,610,410302,166,3559,288,982
    LLLOOMMM256 KB337,882,02418,601,022300,491,5099,283,202
    LLOOOMMM512 KB321,308,63918,400,194299,229,9699,470,482
     Original1,000,000,00043,127,296564,260,86424,375,895

  30. #229
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    116
    Thanks
    94
    Thanked 69 Times in 49 Posts
    Quote Originally Posted by encode View Post
    Added buffer overrun protection!
    Btw, I found a couple of places where bad input could still result in a buffer overrun, I opened a pull request with some fixes on the GitHub page.

  31. The Following User Says Thank You to jibz For This Useful Post:

    encode (15th April 2019)

  32. #230
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Thanks a bunch! Already added these fixes to the upcoming release!
    In addition, I've already added a Fast compression mode (CompressFast())
    Completed the CompressOptimal() (but it will be not included in the upcoming release)
    My main concern for now is the Normal mode.
    Hash chains are good, but 2-byte lookahead in v1.00 is too slow.
    Double hash chains / my MMC implementation failed here - too much time spent on pointer setup.
    Binary tree with partial updates my main thing for experiments for now...

  33. The Following 2 Users Say Thank You to encode For This Useful Post:

    avitar (15th April 2019),jibz (15th April 2019)

  34. #231
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    ULZ v1.01 BETA has been released:
    https://github.com/encode84/ulz

    + New fast compression mode (c1, default)
    + Safer decompressor
    + Reconfigured compression levels:

    1 - CompressFast(), HashTable-based match finder
    2..4 - Compress(), Hash chains, Greedy parsing
    5..9 - Compress(), Hash chains, Non-greedy 1-byte lookahead


  35. The Following 5 Users Say Thank You to encode For This Useful Post:

    avitar (23rd April 2019),Bulat Ziganshin (22nd April 2019),comp1 (19th April 2019),jibz (20th April 2019),Mike (19th April 2019)

  36. #232
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    What is the O bit?

  37. #233
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Quote Originally Posted by smjohn1 View Post
    What is the O bit?
    Higher bits of an Offset (Match Distance)

  38. #234
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    OK, thanks.
    Quote Originally Posted by encode View Post
    Higher bits of an Offset (Match Distance)

  39. #235
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts

    ULZ timing update

    Best regards,

    FatBit
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	ULZ.png 
Views:	42 
Size:	25.0 KB 
ID:	6594  

  40. The Following User Says Thank You to FatBit For This Useful Post:

    encode (28th April 2019)

  41. #236
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    ULZ v1.01 BETA:

    Input:
    enwik9

    Output:
    460,463,054 bytes, 3.158 sec. - 0.742 sec., c1
    409,136,504 bytes, 5.848 sec. - 0.757 sec., c2
    393,640,476 bytes, 7.711 sec. - 0.721 sec., c3
    383,318,792 bytes, 10.716 sec. - 0.719 sec., c4
    363,151,876 bytes, 23.298 sec. - 0.768 sec., c5
    359,939,352 bytes, 31.166 sec. - 0.766 sec., c6
    358,155,488 bytes, 39.326 sec. - 0.786 sec., c7
    357,230,196 bytes, 47.285 sec. - 0.763 sec., c8
    356,494,682 bytes, 65.747 sec. - 0.763 sec., c9

  42. The Following User Says Thank You to Sportman For This Useful Post:

    encode (28th April 2019)

  43. #237
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    @Sportman
    System specs?

  44. #238
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by encode View Post
    System specs?
    Z370
    i9-9900K at 5.3GHz (16MB cache)
    32GB DDR4 4266MHz

  45. The Following User Says Thank You to Sportman For This Useful Post:

    encode (29th April 2019)

  46. #239
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Thanks! Traditionally and through the years, you have a very nice PC specs!

Page 8 of 8 FirstFirst ... 678

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •