Results 1 to 15 of 15

Thread: Leviathan + Updated Kraken Decoder

  1. #1
    Member
    Join Date
    Aug 2016
    Location
    Europe
    Posts
    19
    Thanks
    0
    Thanked 25 Times in 6 Posts

    Leviathan + Updated Kraken Decoder

    https://github.com/powzix/ooz

    I've updated my Oodle decoder with Leviathan support and also the latest versions of Kraken/Mermaid/Selkie. It now also has a proper command line interface and can also invoke the Oodle DLL to compress/decompress.

    The biggest change is that there's now support for tANS, and Leviathan uses 6x2 different LZ77 decoder loops for the different literal/cmd modes described below. Like the other Kraken based formats, it internally uses a bunch of byte streams that are coded using a heavily optimized byte based entropy coder.

    The entropy coder for coding a byte stream supports huffman/tANS/RLE/and combinations of those. Also there's a special multiarray mode used to code the 4/8/16 streams mentioned below, and can also be used where symbol distributions significantly differ inside of regions of a stream to code parts of the stream with different modes/tables. The entropy decoder is the same across Kraken/Mermaid/Selkie/Leviathan, but the encoder picks only the modes that are fast enough for the profile.

    The tANS decoder decodes 5 symbols in parallel. 3 of them in the forwards direction and 2 in the backwards direction (reading bits from the end of the chunk) using a straightforward table-driven approach.

    The biggest difference between Leviathan/Kraken/Mermaid is how the command byte is structured (i.e. the core LZ77 literal/match decoder loop).

    Each byte from the command stream in Leviathan is structured as follows:
    OOOLLMMM

    OOO = Recent offset, or 7 to use an explicit offset from the offset stream. Leviathan supports 7 recent offsets.
    LL = Literal length; 0...2, or 3 to use a length from the start of the length stream.
    MMM = Match length - 2; 2...8, or 9 to use a length from the end of the length stream.

    The length byte stream is encoded using the same entropy coder. If length is 255, the value is taken from a 32-bit length stream.

    There's 2 different ways to encode the huffman code lengths, and 2 different ways to encode tANS weights (one sparse and one regular). The 32-bit offsets can also be encoded in 2 ways with slightly different characteristics. The offsets can optionally be coded in a scaled mode: OFFSET32[i] = OFFSET32[i] * SCALE + LOW8[i] where LOW8[] is a separate byte stream. This is useful when offsets usually are multiples of a given factor.

    First all the byte-streams are decoded for all the parts (literals, offsets, lengths, commands, etc) into a scratch memory space , then the 32-bit offsets/lengths are read using a variable length coding, and then the LZ77 loop runs on this, producing 128kB of decompressed data. Then the next set of streams is decoded and the process repeats.

    Literal modes in Leviathan:
    RAW: Regular bytes
    SUB: Difference between byte and byte from the most recent offset
    LAMSUB: Same as SUB but the first byte of each run is coded in its own entropy stream.
    SUBAND3: Same as SUB but there's four literal streams seperately entropy coded. DST_BYTE_OFFSET%4 controls which literal stream to read from.
    O1: Order-1. Uses 16 literal streams where LAST_BYTE>>4 controls which literal stream to read from.
    SUBANDF: Same as SUBAND3 but 16 streams.

    Additionally, side by side with this, the command stream can be coded in two ways:
    SIMPLE: A single command stream.
    MULTI: 8 separate command streams. DST_BYTE_OFFSET%8 controls which command stream to read from.

    Kraken/Mermaid/Selkie only support RAW and SUB.

    Writing an optimal encoder for this format is a crazy task. There so many choices and a lot of flexibility. Encoding the entropy seems straightforward enough but how can the LZ77 match coder accurately predict the costs... The Oodle guys are smart.

    I renamed the tool to ooz.

    ooz v7.0

    Usage: ooz [options] input [output]
    -c --stdout write to stdout
    -d --decompress decompress (default)
    -z --compress compress (requires oo2core_7_win64.dll)
    -b just benchmark, don't overwrite anything
    -f force overwrite existing file
    --dll decompress with the dll
    --verify decompress and verify that it matches output
    --verify=<folder> verify with files in this folder
    -<1-9> --level=<-4..10> compression level
    -m<k> [k|m|s|l|h] compressor selection
    --kraken --mermaid --selkie --leviathan --hydra compressor selection

    (Warning! not fuzz safe, so please trust the input)


    Cheers
    Powzix
    Last edited by powzix; 14th February 2019 at 13:47.

  2. Thanks (7):

    algorithm (15th February 2019),encode (14th February 2019),GOZARCK (14th February 2019),introspec (15th February 2019),Mike (14th February 2019),Simorq (15th February 2019),svpv (14th February 2019)

  3. #2
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    199
    Thanks
    136
    Thanked 13 Times in 12 Posts
    Can anyone list practical usage scenarios for this?

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    1) Learn and write a better LZ codec of your own.
    2) Port a frontend from some other LZ to match the oodle token format and make an encoder.
    Can't sell it, but can use for game repacks, like a certain already existing codec.

  5. Thanks:

    introspec (27th June 2019)

  6. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    One difference between zlib+brotli and many new compressors is that zlib and brotli interleave data and commands in the most fine-grained way, and others have data in byte blocks of auxillary data. Fine-grained interleaving allows more uncompressed data to be recovered for further processing (including displaying it to the user or issuing new fetches early) while data is still streaming. This is important in many uses, but no compression benchmark attempts to measure completeness of partial decoding. For example, how many bytes can be decoded when the client has received first 2000 bytes.

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > how many bytes can be decoded when the client has received first 2000 bytes.

    Some version of paq (lpaq?) would win at that, since it doesn't have block headers,
    and its decoding speed is faster than network speed anyway, while compression is better :)

  8. #6
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    95
    Thanks
    27
    Thanked 17 Times in 15 Posts
    Quote Originally Posted by necros View Post
    Can anyone list practical usage scenarios for this?
    Getting your data back, for example.

    Look, time would go, companies would appear and disappear, OSes, libs, CPUs and configurations would change. So that cool binary blob you had might eventually stop doing its job for you. And then what? You'll be unable to fix it, authors could be long gone. And so you can't get your data back. Funny, isn't it? When it happens to others, a very good lesson to learn. Available source means one can at least reasonably try to get data back even quite some time later, under different assumptions and configurations, etc. But of course you can learn it hard way, like I eventually did with bink video thing. Yet it proven to be unpleasant and frustrating experience for me. Hopefully things like this would save someone from frustration and waste of time on REing & reinventing wheel for i++'th time. Not to mention option to learn how things around work without resorting to disassembler.

  9. #7
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by Shelwien View Post
    > how many bytes can be decoded when the client has received first 2000 bytes.

    Some version of paq (lpaq?) would win at that, since it doesn't have block headers,
    and its decoding speed is faster than network speed anyway, while compression is better
    I don't know lpaq performance, but with zpaq it wouldn't work:

    Zpaq:4 and zpaq:5 in squash benchmark decode 70-750 kB/s giving an additional 11-114 ms latency for decoding the first 2000 bytes (when compression density is 1:4). For brotli the respective latency increase is only 38 µs. (Numbers are extrapolated from squash-benchmark, xargs.1 file, on Intel i5-2400.)

    While 114 ms doesn't sound a lot, it actually changes the user experience in a measurable way. Even sub-millisecond latency changes can have significant impact on user behavior.

    I'm looking for a well-rounded algorithm, that allows streaming to work efficiently without hampering the decoding or encoding speeds. I consider it a surprise for the system architect and a major system level inflexibility if you cannot decode anything when you already received an MTU full of data bytes. Still, most modern formats seem to have that property.

    Also, according to https://quixdb.github.io/squash-benchmark/ brotli:10 / brotli:11 win in density, too, for small data (i.e., beginning of the file).

  10. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > I don't know lpaq performance,

    Its about 1MB/s, maybe 1.5MB/s at low levels.
    ppmd is faster (~6MB/s maybe; ~12MB/s for ppms).
    Unlike lpaq, ppmd is kinda bad with binary data,
    but its not really a problem for web use.

    > but with zpaq it wouldn't work:

    1) i/o in Matt's coders is usually implemented via getc/putc.
    zpaq has an equivalent get() method which is likely even worse.

    2) zpaq specifically is not a good fit for low-delay benchmark,
    since it has headers and complex configs.

    > While 114 ms doesn't sound a lot, it actually changes the user experience
    > in a measurable way. Even sub-millisecond latency changes can have
    > significant impact on user behavior.

    Actually CM/PPM (and lzma) _codecs_ (rather than archivers) have much lower
    delay than most LZ, since they don't have headers and thus can immediately output data bytes
    already with 4+ bytes of input.
    Especially ppmd/ppms/lzma should be good, because there's no need to initialize large memory blocks.

    > I'm looking for a well-rounded algorithm, that allows streaming to work
    > efficiently without hampering the decoding or encoding speeds.

    I agree. I even specifically designed reflate for streaming and low-delay.
    But as to speed, currently its limited by network - so even slower codecs
    won't have a problem with realtime decoding on user side.

    Actually resource usage on server side should be more of a problem
    (since server is supposed to encode the data for many users at once).
    But brotli looks incompatible with that?
    I guess serving statically encoded files is enough?

    > Still, most modern formats seem to have that property.

    Sometimes its not formats, but rather implementations.
    Its usually possible to significally reduce the decoding delay
    by tweaking with codec i/o etc.

    > brotli:11 win in density, too, for small data (i.e., beginning of the file).

    Sure, since brotli has a built-in dictionary and uses it by default.

    But its actually simply unfair comparison, rather than brotli's greatness,
    because some other codecs also have explicit support for dictionary mode,
    some can be modified to support it easily enough,
    and it can be actually implemented universally for most codecs:
    via diff(compress(dict),compress(dict+data)).

  11. Thanks:

    Mike (18th February 2019)

  12. #9
    Member
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hey, I tried powzix decompressor for a while now and it works totally fine in most cases.
    But I just found a bug/ issue while decompressing some chunks from the game Fortnite.
    I attached one of these chunks (https://www.dropbox.com/s/5k6w0g4spw...dleTest.d?dl=0)
    Your decompressor fails while decompressing this chunk while the dll (oo2core_7_win64.dll) suceeds.
    It would be great if you can have a look and maybe also fix this problem

    UPDATE: I see that it uses decoder 0x02, idk for which one this stands but it would be awesome if you or someone else can add support for this one too
    or atleast tell me which compressor was used for this

    Thanks in advance
    FunGames

  13. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    Did you try this one? https://github.com/rarten/ooz/

  14. #11
    Member
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Did you try this one? https://github.com/rarten/ooz/
    Yes, it also doesn't work.
    Im pretty sure that it got compressed with a different compression method but idk which one it is
    It has compressor tag 0x02 tho

  15. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    Code:
    8C 07 -  0:LZH
    8C 00 -  1:LZHLW
    8C 01 -  2:LZNIB
    CC 07 -  3:None
    8C 02 -  4:LZB16
    8C 03 -  5:LZBLW
    8C 04 -  6:LZA
    8C 05 -  7:LZNA
    8C 06 -  8:Kraken
    8C 0A -  9:Mermaid
    8C 0B - 10:BitKnit
    8C 0A - 11:Selkie
    8C 0A - 12:Hydra
    8C 0C - 13:Leviathan

  16. #13
    Member
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    So it is LZB16 or LZNIB? I would guess LZB16

  17. #14
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by Shelwien View Post
    Sure, since brotli has a built-in dictionary and uses it by default.

    But its actually simply unfair comparison, rather than brotli's greatness,...
    That is only ⅓ of the story. Brotli has several related good properties. First, the entropy codes are coded more densely than in other codecs. Second, there is not much overhead. Third, commands, literals and copies are interleaved in the stream, allowing data (such as urls to be fetched) to be extracted earlier than in a blocked format that separates commands, literals and copies.

    Several people have benchmarked brotli with and without the static dictionary. In all tests it compresses short data better than lzma and zstd, with or without the dictionary.

  18. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    Not sure how am I supposed to disable brotli dictionary (compiling with -DBROTLI_EXTERNAL_DICTIONARY_DATA just makes it crash),
    but I tried zeroing it out for testing.
    Code:
    BOOK1   BOOK1_16k BOOK1_4k 
    768,771  16,384   4,096 
    261,071   7,393   2,188  // lzma default
    260,885   7,460   2,228  // lzma -lc5 -lp0 -pb0 -fb273 
    261,953   7,324   2,127  // lzma -lc0 -lp0 -pb0 -fb273 
    256,359   6,374   1,724  // brotli 1.07 --best
    261,309   7,193   2,041  // brotli 1.07 --best (dictionary is zeroed out)
    264,731   7,350   2,107  // zstd -22
    262,314   6,855   1,882  // zstd -22 -D common\dictionary.bin

  19. Thanks:

    Jyrki Alakuijala (28th June 2019)

Similar Threads

  1. Kraken compressor
    By HeirToday in forum Data Compression
    Replies: 50
    Last Post: 25th September 2020, 11:09
  2. Mermaid and Selkie join Kraken
    By SolidComp in forum Data Compression
    Replies: 29
    Last Post: 1st February 2019, 06:54
  3. Replies: 92
    Last Post: 14th September 2016, 16:42
  4. Code generation in LZ decoder / Branchless LZ77 Decoder
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 30th September 2010, 20:48
  5. Bee archiver updated!
    By LovePimple in forum Forum Archive
    Replies: 7
    Last Post: 18th February 2007, 14:05

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •