Results 1 to 17 of 17

Thread: Delta transformation

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Just have an idea to add a simple delta/table transformer with auto-detection to the LZPM.

    There are many possibilities. For example, here is the approach used with SQX v1.0:

    1. For each small block (256-4K bytes) we count number of bytes seen with and without delta transformation. Originally SQX keeps track on:
    0 - raw data
    1 thru 4 - delta transform (8 - 32-bit analog data accordingly)
    I have an idea to restrict to say 0 and 4 (untouched data, and delta for 32-bit tables/data)
    2. We choose the transform with smallest numbers of symbols
    for example:
    raw data contains 200 different symbols
    and
    delta processed data contains just 50 symbols
    we will choose to enable processing. Thus, for next block we switch our current mode to say delta4.
    3. And so on, we process each small block to decide which mode use with the next one.
    4. Decoder does the reverse thing - we process each block (decode if needed) and decide use or not preprocessing for the next block.

    The idea is simple and relatively fast. The question is does anybody know another possibilities/methods for FAST delta encoding with auto detection?


  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    1. there are two rather different things - MM data and various binary tables

    2. for the last one i know only 3 implementations - arjz, rar, freearc. you may look into sources of my delta filter

    3. shortly said, my filter consists of 3 parts: approximately detect place where such such table exists, find its exact bounds and check which bytes in a row should be substracted

    4. i also have MM detector in the MM package, but it checks whole file

    so, the first question - does you need to process tables, MM data or both?

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    I briefly looked your sources - I think I need something simpler and smaller.

    A few statements what I need:
    1. Filter should NOT change the data size (i.e. input size == output size in any case)
    2. In worst case filter should not ruin the compression - i.e. it should work very accurate.

    I think it must be universal - i.e. process both MM and tables. 32-bit delta should cover both 32-bit tables and some of the MM data like 16-bit stereo WAV files. I briefly tested a simple scheme I described, well it works, in some cases provides great compression gain, however, it also can sligtly inflate the file.

    Having said I tested various modes (16/24/32-bit deltas) with EXEs 32-bit works the best, also it helps with WAV files.

    Concluding - I need something simpler to get extra compression on EXE and some multimedia files like 16-bit stereo WAVs.


  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    small description of my delta filter implementation. btw, according to my tests, it slightly ouperforms rar's one on real data - i.e. if RAR improves compression on 2% on average, my filter improves ratio for the same compressor by 2.2%

    what is the main difference of subtractable data? their compression improved after subtraction because in original form these data consist of monotonic (increasing or decreasing) regions. for MM data, these regions are very short (5-10 items are typical), so we get small compression improvement per each byte, but these data typically are large enough, so we have good overall improvemet

    for binary tables, the situation is opposite - tables may be small (i got compression improvements even processing 20-element tables), but whole table may be monotonic, or at least monotonic regions are large enough (tens of elements, at least)

    this means some differences in parameters of tests for MM and tables, although overall strategy may be the same. my current implementation are tuned only for tables, although. i believe that it finds most of the real tables in data, although definitely not 100%

    so, we need to detect monotonic sequences having the same distances between their elements. moreover, both rar and fa is able to detect "mixed tables" which includes both subtractable and non-subtractable data: rar up to 16-byte rows, fa - up to 30-bytes

    1. first stage is *fast* detection of *possible* tables. it is implemented as following:

    a) for each input byte we find last position where the same or "almost the same" byte was encountered and increment counter for distance between those two bytes

    b) these "distance counters" are counted for each 32-byte block

    c) at the end of the block we check which distances was encountered at least 5 times - it's a candidates to table row sizes

    d) each such candidate tested more thorough - we check that there is monotonic increasing or decreasing sequence of bytes at these distances. because tables smaller than 32 bytes are almost useless, we can be sure that each "good" table crosses 32-byte blocks boundary, so we limit our check to only sequence at the beginning of the block

    but because potential table may have only one subtractable byte, we need to check every possible offset of this table relative to block start. f.e., if we have an assumption that there are table with 4-byte rows, we check for monotony the follwoing sequences:

    table[0], table[4], table[8], table[12]
    table[1], table[5], table[9], table[13]
    table[2], table[6], table[10], table[14]
    table[3], table[7], table[11], table[15]


    2. second stage is to find table bounds. at the entry of this stage we know that there is monotonic sequence starting at position p with step N. we scan this sequence forward and backward, checking length of its monotonic fragments. shortly said, we establish ends for tables where lengths of monotonic segments becomes less than 4

    3. now we have table bounds and its row size. we should check which table columns should be subtracted. my current algorithm is very weak, though - i subtract everything except for (almost) constant fields

    so, while my algorithm is far from perfect, i believe that it provides 80-90% of profit that delta filter may gain at maximum. you may see at the tables it detect by compiling delta.cpp with -DSTAT and using "-v -v" option

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    OK, some results with draft version of my filter (32-bit delta with auto-detection):

    SFC
    LZPM 0.12: 12,662,139 bytes
    LZPM 0.12+TABFLT: 12,581,386 bytes

    Some info. On already compressed data and text files LZPM with TABFLT shows the same results, since the filter is pretty smart and enables transform only if it ensures we got a MM/TAB data. Anyway, results for some single files:

    acrord32.exe
    LZPM 0.12: 1,476,859 bytes
    LZPM 0.12+TABFLT: 1,417,709 bytes

    mso97.dll
    LZPM 0.12: 1,882,707 bytes
    LZPM 0.12+TABFLT: 1,880,932 bytes
    And if we will use a 16-bit version of the filter with this file we will get:
    LZPM 0.12+TABFLT(16-bit): 1,873,113 bytes

    ohs.doc
    LZPM 0.12: 828,504 bytes
    LZPM 0.12+TABFLT: 808,662 bytes


  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Some results with WAV files:

    track5.WAV
    LZPM 0.12: 25,043,201 bytes
    LZPM 0.12+TABFLT: 21,103,953 bytes

    Barcelona.wav
    LZPM 0.12: 50,863,863 bytes
    LZPM 0.12+TABFLT: 44,330,513 bytes

    This filter can perform better on these files if we switch to less aggressive filter shut down, however, this may hurt on other files, so the results "as is".


  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    OK, and here are results for an improved version of this filter:

    acrord32.exe: 1,410,061 bytes
    mso97.dll: 1,878,884 bytes
    ohs.doc: 808,787 bytes

    track5.WAV: 19,665,318 bytes
    Barcelona.wav: 41,654,342 bytes

    In other words, it's only a beginning...

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    SFC
    LZPM 0.12+TABFLT(new): 12,571,840 bytes


  9. #9
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    My BMP,TIFF,TGA and Wave filter use Delta coding!

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    Quote Originally Posted by Nania Francesco Antonio
    My BMP,TIFF,TGA and Wave filter use Delta coding!
    you just detect these files by their headers and then use Delt on the whole file, while encode tries to dynamically enable and disbale filter depeding on file contents

  11. #11
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Bulat, had not thought about this really, nevertheless have used various channels delta in base to the type of file, for example for the files BMP have used only a channel!

  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    your English becomes better and better!

  13. #13
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks Bulat! my English is very BAD !

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    not "is", but "was"

  15. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    why not use precomputed filter parameters? ie. encoder finds table in regions 4- 500, 600- 800 and 12000- 399999, then builds header containing that information. you could process even very small blocks (say down to 100 elements) and use super strong exotic compression method for headers.

    overall this wouldn't hurt decompression speed because time saved by obtaining longer matches would be higher than time lost on decompressing headers.

    and you could use many many types of filters without hurting decompression speed (as long as those filters would be fast), because decoder is told about types of filters used (doesn't need to guess them).

    furthermore, you can improve mm detector later without breaking backward compatibility. you can add specific filters for specific formats - ie. recognize by headers and then process accurately. you can add support for more and more formats in succesive versions of your program.

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Agreed. However, currently I'm working more on the main LZPM's engine - preprocessing is just an addition, since lzpm.exe should represent the main idea of my compression in its purest form. E8 preprocessing is VERY small and fast, worth use it.

  17. #17
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode
    We choose the transform with smallest numbers of symbols
    for example:
    raw data contains 200 different symbols
    and
    delta processed data contains just 50 symbols
    number of symbols is not the main parameter
    sometimes before delta you got 60 values, after the whole 256
    but it still compress better

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 03:41
  2. Transformation for UIQ2
    By Jan Ondrus in forum Data Compression
    Replies: 49
    Last Post: 4th October 2009, 18:30
  3. Brute forcing Delta block size
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 2nd May 2009, 13:44
  4. REP and Delta fails with big files
    By SvenBent in forum Data Compression
    Replies: 14
    Last Post: 23rd November 2008, 20:41
  5. Delta: binary tables preprocessor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 1st April 2008, 10:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •