Page 2 of 2 FirstFirst 12
Results 31 to 37 of 37

Thread: I have no luck with additional compression TS40.txt...

  1. #31
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    123
    Thanks
    36
    Thanked 13 Times in 9 Posts
    Hm, I tried to compress offsets.txt from my archive above with lzpm & lzturbo -32 (-39). Both refused to compress and just added their own header to the file...

    lzpm: 10000 bytes -> 10636 bytes
    lzturbo: 10000 bytes -> 10033 bytes.

    But where is ARI/tANS?

  2. #32
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,962
    Thanks
    294
    Thanked 1,293 Times in 734 Posts
    Well, low bytes of these offsets (if you mean that, since they don't fit in a byte) do seem pretty random.
    CM (paq8px,nzcc) does compress them to 9950 or so, but certainly not plain "ARI/tANS".

    The usual trick is parsing optimization though - there're usually multiple candidates for matches
    (multiple instances of a word etc), so you can choose the one that is most compressible in context of others.

  3. #33
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    123
    Thanks
    36
    Thanked 13 Times in 9 Posts
    It's not low bytes, it's 10000 9-bits offsets (from offset slot № 1).
    For Rapid compression (40 sec. for compress/decompress for 1Gb, 1 sec. == 1 Mb) parsing is not suitable, I'm using 5 hash tables instead...

  4. #34
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,962
    Thanks
    294
    Thanked 1,293 Times in 734 Posts
    > parsing is not suitable,

    Maybe offline optimization of a heuristic parser?
    I mean, see what slow parsing optimizer outputs and how to predict its behavior from hash matches and input data?
    Eg. LZMA a0 is an example of that, and even a1 is not 100% bruteforce.

    > I'm using 5 hash tables instead...

    If you need fast encoding anyway, maybe try ROLZ?

  5. #35
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    123
    Thanks
    36
    Thanked 13 Times in 9 Posts
    Regarding ROLZ I saw only idea on ru.wikipedia.org/wiki/ROLZ.

  6. #36
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    64
    Thanks
    10
    Thanked 17 Times in 12 Posts
    Quote Originally Posted by lz77 View Post
    Regarding ROLZ I saw only idea on ru.wikipedia.org/wiki/ROLZ.
    English Wikipedia removed the article because it wasn't citing well known sources. But this was years ago and it might be put up again as there are more references now.

  7. #37
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,962
    Thanks
    294
    Thanked 1,293 Times in 734 Posts
    http://mattmahoney.net/dc/dce.html#Section_525

    Basically use offset within a hashtable cell instead of global distance.
    Same has to be done during decoding, so decoding is slower.
    But compression is better.
    Its something like intermediate step from LZ to PPM.

    Christian's RZ is ROLZ.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Replies: 4
    Last Post: 25th October 2018, 00:31
  2. NLP and compression of TXT files
    By BetaTester in forum Data Compression
    Replies: 0
    Last Post: 13th June 2012, 22:34
  3. APPNOTE.TXT
    By ggf31416 in forum Forum Archive
    Replies: 0
    Last Post: 30th September 2006, 16:04

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •