Results 1 to 5 of 5

Thread: How do I dump the strings LZ77 matches?

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    238
    Thanks
    95
    Thanked 47 Times in 31 Posts

    How do I dump the strings LZ77 matches?

    Say I'm running zlib or libdeflate to gzip a text file... How can I extract a list of the strings that LZ77 has decided are matches with previous strings?

    I'm particularly interested in how it handles matchable substrings of matched superstrings, like:

    Code:
    
    type="text/javascript"
    
    type="text"
    
    type="text

    So how do I extract the matches?

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    find routine like encode_match or so and add printf statement

  3. Thanks:

    SolidComp (27th August 2016)

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,373
    Thanks
    213
    Thanked 1,021 Times in 542 Posts
    http://nishi.dreamhosters.com/u/defl_matches_v0.rar
    Code:
    C:\!arc\defl_matches_v0>test.bat
    Extract raw deflate stream from .zip -> 00000000.raw
    beg=00000023 last=0 type=2 size=2977 unplen=7705
    end=000492F6 bufbeg=00000007 bufend=00000000
    Extract LZ token stream from .raw -> 00000000.dec
    Unpack the data from .dec -> 00000000.unp
    0a0fdbaf0589c9713bde9120cbb20199 *00000000.unp
    lit: "\x0A"
    match: "acco"
    lit: "r"
    lit: "d"
    match: "ingly.'\x0A"
    match: "THE "
    lit: "E"
    lit: "N"
    lit: "D"
    lit: "\x0A"

  5. Thanks (2):

    RamiroCruzo (27th August 2016),SolidComp (27th August 2016)

  6. #4
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    238
    Thanks
    95
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by Shelwien View Post
    http://nishi.dreamhosters.com/u/defl_matches_v0.rar
    Code:
    C:\!arc\defl_matches_v0>test.bat
    Extract raw deflate stream from .zip -> 00000000.raw
    beg=00000023 last=0 type=2 size=2977 unplen=7705
    end=000492F6 bufbeg=00000007 bufend=00000000
    Extract LZ token stream from .raw -> 00000000.dec
    Unpack the data from .dec -> 00000000.unp
    0a0fdbaf0589c9713bde9120cbb20199 *00000000.unp
    lit: "\x0A"
    match: "acco"
    lit: "r"
    lit: "d"
    match: "ingly.'\x0A"
    match: "THE "
    lit: "E"
    lit: "N"
    lit: "D"
    lit: "\x0A"
    Thanks @Shelwien. I'm not too familiar with .bat files. If I understand the snippet above, I run test.bat at the command line, and the remainder of that snippet is the output of the command? How does it know which .zip file to process? I don't see any naming of a target .zip file anywhere. (I assume this will process .gz files too?)

    p.s. Your site is still being blocked by BitDefender and other tools.

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,373
    Thanks
    213
    Thanked 1,021 Times in 542 Posts
    Here's the bat file:
    Code:
    echo Extract raw deflate stream from .zip -^> 00000000.raw
    rawdet.exe book1.zip nul nul 
    
    echo Extract LZ token stream from .raw -^> 00000000.dec
    raw2dec.exe 00000000.raw 00000000.dec 
    
    echo Unpack the data from .dec -^> 00000000.unp
    dec2unp.exe 00000000.dec 00000000.unp > 00000000.txt
    
    md5sum 00000000.unp
    
    tail 00000000.txt
    Do you see where to put the .zip now?
    Though actually it would extract the deflate streams from any file containing them, including .pdf,.png etc.

    But the main point are the utils for deflate parsing - there's source included, so you can tweak them how you need.
    For example, I used raw2dec and a similar lzma utility to compare deflate and lzma entropy coding on the same
    sequence of matches. threads/1288-LZMA-markup-tool

    > I don't see any naming of a target .zip file anywhere.

    You can replace book1.zip above with %1, then "test.bat somefile" would process what you specify.
    Or you can just do the whole 3 steps manually.

    > p.s. Your site is still being blocked by BitDefender and other tools.

    Well, AV guys are known for blocking any hacking tools, so it may be "correct" about that.

  8. Thanks:

    RamiroCruzo (27th August 2016)

Similar Threads

  1. dump uncompressed PNG residuals?
    By Paul W. in forum Data Compression
    Replies: 23
    Last Post: 10th July 2016, 03:42
  2. Replies: 3
    Last Post: 16th May 2016, 01:02
  3. Text strings coding chemical structures
    By FatBit in forum Data Compression
    Replies: 21
    Last Post: 19th February 2016, 21:16
  4. Concatenating strings
    By andromeda in forum Data Compression
    Replies: 29
    Last Post: 15th September 2014, 08:51
  5. Most efficient/practical compression method for short strings?
    By never frog in forum Data Compression
    Replies: 6
    Last Post: 1st September 2009, 05:05

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •