Page 1 of 3 123 LastLast
Results 1 to 30 of 84

Thread: TarsaLZP

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    i think there's good place to publish my work. maybe someone will use some ideas from my program to improve own program (especially francesco ).

    i've uploaded version 16 june 2007 to: http://asembler.republika.pl/bin/TarsaLZP.zip

    it's probably slower due to new unoptimized code (ie. binary indexed trees).

  2. #2
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Measured feature / newer version / older version (may 12):
    Compression speed / ~2500 kBps / 4169 kBps
    Decompression speed / ~2300 kBps / 3610 kBps
    Compressed size / 16 560 327 B / 16 596 577 B


  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    thanks bf!

    unfortunately i failed to optimize the new code
    mmx linear search is still a lot faster, so i will use it again.

    now i'm working on lzp layer, ie. improving compression ratio. expect new version tommorow

    some more advanced ppm is on my to do list. currently i'm using fixed order- 2 coder. it's extremely inefficient on small files, coz it doesn't use orders lower than 2. i'm thinking about order- 3- 2- 1- coder without escaping (coz i don't like escaping ) and with some sort of simplified information inheritance (as in ppmii from dmitry shkarin). by simplified information inheritance i mean creating new contexts (to be strict - statistics for context) using statistics from lower order context - it will be done only once, when new context is created.

    if anyone has other idea about fast ppm coder without escaping please tell me

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    okay, i've done the version that i planned for tommorow

    version 17 june 2007: http://asembler.republika.pl/bin/TarsaLZP.zip

    i'm waiting for benchmarks

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  6. #6
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Measured feature / version june 17 / version may 12:
    Compression speed / 4169 kBps / 4169 kBps
    Decompression speed / 3654 kBps / 3610 kBps
    Compressed size / 16 229 600 B / 16 596 577 B

    Very nice compression improvement, Piotr!

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    thank you, matt & bf!

  8. #8
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Very Good Work!

  9. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    version 23 june 2007: http://asembler.republika.pl/bin/TarsaLZPNeedsTuni ng.zip (maybe the name looks funny)

    i've reduced precision of entropy coder for lzp flags to one byte. this allows to use lookup tables (i call them transitiontables) for more flexible update of probabilities (in fpaq0p probability was always updated by 1/ 32th of the error - with luts you can update each value with different factor).

    i've tried to tune up the tables but i didn't come with anything impressive. if anyone has experience in this field please help.

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    version 18 july 2007: http://asembler.republika.pl/bin/TarsaLZPPrivate.z ip

    i haven't updated documentation and i haven't added some features yet (like detecting incompressible streams - this version does really bad when there are such streams in file), so i haven't published sources and documentation for this.

    i've abandoned transition tables (coz they are inefficient) and used something very similar to fpaq0p - the only difference is that my implementation is less aggressive when probabilities are very skewed - like 0.01 or 0.99.

    also i have changed ppm coder - now it's more universal, ie. adapts better so manual tuning isn't needed so much, but it seriously expands incompressible (or random) data.

    i've discovered that sometimes my program does better when the file is preprocessed by external lzp - get http://www.compression.ru/ds/lzp.rar and use with parameter -l7.

    from my tests external lzp helps with two files (from maximumcompression.com mfc testset):

    hlp: from 922 545 to 760 495 (-l10)
    txt: from 907 085 to 701 918 (-l7)

    it's a huge increase in compression ratio - it suggests that i should make two lzp models (one with short and one with long context) and switch adaptively between them.

    anyway there is a hell a lot of room for improvement, in both speed and compression ratio. and remember that tarsalzp doesn't use any filters.

    ps:
    ilia's method of updating probabilities is mathematically optimal when the probability is in the range [1/ 32 ... 31/ 32] and too agressive otherwise.

  11. #11
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks donkey7!

  12. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    version 30 july 2007: http://asembler.republika.pl/bin/TarsaLZPPrivate.z ip

    documentation still isn't updated.

    i've made two lzp models - one with 8 byte context and one with 4 byte context (both hashed - previously there was only one nonhashed 3 byte context lzp model) - program simply selects model that gives higher probability for match.

    on my computer (celeron 300a mendocino, 96 mb ram) it's about 50 % slower but has better compression especially on text files.

    now on to do list are:
    - optimize new code to reduce the newly introduced speed hit,
    - tune somewhat lzp models (eg. change lengths and hash functions),
    - rewrite ppm model from scratch (and i think that order- 2 model is enough),

    matt:
    actually there are only two versions of tarsalzp available: 17 june 2007 from maximumcompression.com and current (tarsalzpprivate.zip). you should remove broken links.

    ps:
    with successive versions i was adding some extensively used small tables. now there's a lot of luts (look up tables) that takes few thousand of kilobytes, so i think that tunings on my machine (celeron with only 128 kb of l2 cache) won't give reasonable results on machines with larger caches. so i'm waiting for benchmarks

  13. #13
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks donkey7!

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I updated TarsaLZP and removed broken links.
    http://cs.fit.edu/~mmahoney/compression/text.html#2336

    Nice improvement in compression, with even less memory There is nothing faster that compresses better.

    One problem is encode crashed when the input file was in another directory as in:

    timer encode \res\enwik8 x8

    but it worked OK from the same directory as in

    timer \tmp\encode enwik8 x8

  15. #15
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by donkey7
    on my computer (celeron 300a mendocino, 96 mb ram)
    Which version of MS Windows do you have installed?

  16. #16
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thanks donkey7! Made a quick test:
    ____version__________________size_________in___out _ speed
    TarsaLZP (2007-06-17) ______16 229 600___4 169 / 3 654
    TarsaLZP Private (2007-07-1 16 627 650___4 625 / 3 844
    TarsaLZP Private (2007-07-30) 14 576 992___2 487 / 2 193 (kB/s)

  17. #17
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Matt Mahoney
    Nice improvement in compression, with even less memory There is nothing faster that compresses better.
    but anyway ill try to optimize it. the speed hit is just too high.

    Quote Originally Posted by Matt Mahoney
    One problem is encode crashed when the input file was in another directory as in:

    timer encode
    esenwik8 x8
    ok, thanks for reporting, ill look at that.

    Quote Originally Posted by LovePimple
    Which version of MS Windows do you have installed?
    win 98 se polish. i use that old computer only for developing programs and reading pdfs most of the time i spend on second computer which is connected to internet (athlon 2200+, 512 ram, win xp sp2 polish).

    btw:
    does anyone have cachegrind compiled?
    http://valgrind.org/info/tools.html#cachegrind
    it could be useful.

    ps:
    thanks matt & bf for benching

  18. #18
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by donkey7
    win 98 se polish. i use that old computer only for developing programs and reading pdfs most of the time i spend on second computer which is connected to internet (athlon 2200+, 512 ram, win xp sp2 polish).
    Thanks for the info!

  19. #19
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Matt Mahoney
    One problem is encode crashed when the input file was in another directory as in:

    timer encode
    esenwik8 x8

    but it worked OK from the same directory as in

    timer mpencode enwik8 x8
    ive looked at this. on my system it works ok in both cases (windows xp). do other preople have the same problem as matt?

    version 1 august 2007: http://asembler.republika.pl/bin/TarsaLZPPrivate.z ip

    its heavy version - it uses over 300 mb of ram (ie. ive developed version that used less memory, then moved to another computer, changed sources a bit and ... you know).

    unfortunately ive failed with optimizing code because slowness is caused by cache misses - and they occur all the time because using hashing = reading from random addresses.

    in this version ive abandoned current match histories (in contexts) and used state tables from lpaq1 instead. its similar but usually (on text files mostly) its better. thanks matt

    looks like i really should make order 3 ppm. optimizing for speed is no longer possible (unless we want to have much weaker compression) and adding third order ppm shouldnt make big speed decrease so its a good candidate.

    currently tarsalzp has still order- 2 core.

  20. #20
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks donkey7!

    Quote Originally Posted by donkey7
    do other people have the same problem as matt?
    Yes!

  21. #21
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    donkey7
    nowadays optimizing compression algorithms in most cases means minimizing cache misses. with each miss you lose 50-100 ticks, so time required for execution of commands no more have much meaning. for example, quad 1.12 was 2x faster than 1.10 just because it sunsbtantuially decreased amount of misses

    you can try the smae technique - storing bits of adressed data in cache elements, this avoids one memory access in most times. look for example at "quick lz search" topic by Lasse Reinhold

    another possible solution used in lzma is using 2 or more threads - in this case memory access from one thread partially overlap with computations in another (of course, only for HT/multicore processors)

  22. #22
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by LovePimple
    Yes!
    strange. ill investigate it further. currently im gathering filenames from parameters using __GetMainArgs from crtdll.dll and passing them directly to CreateFileA from kernel32.dll.

    Bulat Ziganshin
    yes, i know that cache misses have huge influence on fast compressors. but there is little i can do. i cant combine eg. lzp 4 with lzp 8 because they has different hashes so they reads from different memory positions, so performance would be even lower when theyre interleaved.

    ive tried resizing lzp models that they can fit in l2 cache of athlon (512 kb) but compression was then very low (like early tarsalzp versions). maybe with bigger caches (like 2 mib or 4 mib as in core 2 duos) it will be posible to keep entire lzp models in l2 caches and achieve great compression ratio (but much lower than recent tarsalzp versions) - but since i dont have core 2, but athlon and benchmarks are usually done on athlons, im not really interested in it.

    but, thanks to you, one idea came to my mind - lzp and ppm can be done in separate passes. lzp (combined or not) for each input byte output two bytes - predicted byte and its probability. so lzp can process one megabyte chunk, then output will be processed by ppm, then lzp process next megabyte chunk and so on. maybe this will give some speed improvement (because then cache will only keep one kind of information, so it should work like you have two times bigger cache).

    ... hmm ... even three passes would be possible. the lzp models also can perform separately

    Quote Originally Posted by Bulat Ziganshin
    another possible solution used in lzma is using 2 or more threads - in this case memory access from one thread partially overlap with computations in another (of course, only for HT/multicore processors)
    tarsalzp has very little computations (its very very simple and short algorithm) so it wouldnt help. lzma does heavy computations when doing optimal parsing, so one thread can find matches and second do optimal parsing.

    some time ago ive read agners manuals (agner.org), so i know somewhat how cache works and what influence it has on performance. i just thought that there will be much less cache misses

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Since you go beyond order-2 - e.g. order-3, I guess you can freely use order-8 LZP.

    During decompression you can use some trick:
    Update the LZP tables and do not perform the context confirmation.

    i.e. separate the table access to:

    Pos() - give the buffer position based on longest confirmed context.

    Update() - update the table - i.e. calculate hashes for current contexts and write current position to tables. As you can see here we can abandon the buffer access for the context confirmation. And since LZP outputs a larger numbers of literals most of the time we will just update our tables - as a result faster decompression.


  24. #24
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by encode
    Since you go beyond order-2 - e.g. order-3, I guess you can freely use order-8 LZP.
    lzp uses other kind of information than ppm. so even order- 2 lzp can help with order-2 ppm on files like english.dic or similiar files.

    ppm is like mp3 - it encodes in frequency domain. it assumes that symbols are uniformly distibuted (in contexts of course).
    lzp (my implementation) is like schindlers transform from szip (ie. its output is like st output, but you know which symbols goes to particular context - it encodes a flag indicating if current byte was equal to previous byte) - it encodes in time domain.

    im trying to find contexts length combination that gives most accurate prediction. lzp order isnt dependeny from ppm order. i guess that order-8 lzp can help even with order-50 ppm (of course only on some types of files, on other it would hurt - but overall if lzp order isnt higher than ppm order then it should help).

    Quote Originally Posted by encode
    During decompression you can use some trick:
    Update the LZP tables and do not perform the context confirmation.

    i.e. separate the table access to:

    Pos() - give the buffer position based on longest confirmed context.

    Update() - update the table - i.e. calculate hashes for current contexts and write current position to tables. As you can see here we can abandon the buffer access for the context confirmation. And since LZP outputs a larger numbers of literals most of the time we will just update our tables - as a result faster decompression.
    i dont fully understand what youve wrote.

    but you misundestood how my lzp works. it saves (in contexts) last seen byte, not position in buffer (so there are only matches of length one). so buffer may even not exist! simple getchar would be equally good. buffer is present only to reduce i/ o overhead.

    my lzp isnt normal lzp. if there are five matching bytes my lzp model doesnt write 5 to stream but 1,1,1,1,1.

    lzp code in encoder and decoder is <u>almost exactly the same</u>. the only difference is that encoder checks if predicted byte is equal to current byte and encodes flag but decoder only decodes flag.

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Just forgot about your LZP details.

    I though that you complained about order-8 speed impact...

  26. #26
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by encode
    Just forgot about your LZP details.
    dont worry



    ive created some statistics about performance of recent tarsalzp version on ltcb: http://phpfi.com/253917 (ive pasted them there because this forum doesnt support monotype fonts).

    i wonder how much order- 3 can help in terms of compression ratio. if it can reduce final compressed file size by 20 % then tarsalzp will beat ccmx on ltcb (it will be about 174 000 000 bytes).

    but i think that the improvement will be rather about 15 %, so filesize would be about 184 000 000 (still good).

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Do you use/tried flag exclusion? For example if a higher order context (order-4 and/or order- is not confirmed, encode just a byte without any flags.

  28. #28
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by encode
    not confirmed
    did you mean not seen before?

    no, i havent thought about that. i think it can give improvement by a fraction of percent (higher on smaller files) - but it is always worth trying.

    thank you for your idea

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by donkey7
    did you mean not seen before?
    Not only for first timers. A small example:

    If you use a hash table to keep track of seen bytes within each context.

    Based on current context youll make a hash. After, youll access hash table with given hash value. (Each hash element must have a stored context for context confirmation.) If context is not confirmed you may skip to lower order. For example if order-8 context is not confirmed you can skip to order-4, if order-4 will be confirmed you will check current byte, if byte not the same you will code the flag 0. If order-4 context is also not conformed you will encode just one byte.

    A pseudo code (just order-4):

    int currenByte = GetByte();

    int hash = gethash(context4);

    int byte = hashtab[hash][0]; // hashtab[h][0] - byte, hashtab[h][1] - actual context

    // no valid context found or this context found for a first time
    if ((byte == -1) or (hashtab[hash][1] != context4))
    encodeByteViaPPM(currentByte);
    else { // context found
    if (byte == currentByte) {
    encodeFlag(1); // success
    }
    else {
    encodeFlag(0); // failure
    encodeByteViaPPM(currentByte);
    }
    }


  30. #30
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >but you misundestood how my lzp works. it saves (in contexts) last seen byte, not position in buffer (so there are only matches of length one).

    Ilya mixed LZP with his own ROLZ algoritm

Page 1 of 3 123 LastLast

Similar Threads

  1. Java port of TarsaLZP
    By Piotr Tarsa in forum Data Compression
    Replies: 19
    Last Post: 8th July 2009, 07:46
  2. need advice on low- order backend of tarsalzp
    By Piotr Tarsa in forum Forum Archive
    Replies: 4
    Last Post: 25th March 2008, 13:03

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •