Results 1 to 16 of 16

Thread: Is any compressor able?

  1. #1
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    237
    Thanks
    39
    Thanked 92 Times in 48 Posts

    Question Is any compressor able?

    I have a special file - http://www.imagecompression.info/gralic/demodata.zip - million bytes.

    RAR, 7-zip and BZIP2 fail to compress it (and only RAR is smart enough not to increase the size by more than 100 bytes).

    PAQ8px_v69 -8 compresses it to 986554 bytes (thus, PAQ* could potentially analyse which models are useless and switch them off).

    After I modify one bit in paq8px_v69.cpp and compile it, this modified version compresses demodata.bin to less than 667000 bytes (thus, the most appropriate model is pretty simple). The bit is not in recordModel().

    My question:
    is any compressor able to pack this file to less than 900000 bytes?
    Last edited by Alexander Rhatushnyak; 23rd July 2010 at 22:14.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    It could be more interesting if you could find a real file like that

  3. #3
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    237
    Thanks
    39
    Thanked 92 Times in 48 Posts
    Quote Originally Posted by Shelwien View Post
    if you could find a real file like that
    Almost every real file has this property: some models are very useful, and some models are totally useless.
    Problem: when you add more and more models to a model-mixing compression program, compression quality gets a little bit better on average, but speed suffers so much that at some point you decide to stop adding models.
    Solution: detect which models (or which classes of models) are more useful than others, and apply only a dozen of the most appropriate models, switch off all other models.

  4. #4
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    That's a common - but in my eyes really wrong approach. People should better do it the engineering way. Given some typical data as a design base:

    0. Goal definition,
    1. data analysis,
    2. define a model structure,
    3. fit model parameters and
    4. analyse, not just comparing the bpc, the result. Restart at 1, if the results are too bad.

    Cheers

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    The answer is yes. Your paq8px_v69 with one modified bit can compress it.

    But it's well known that such puzzles are easy to make but hard to solve. Attached is a 1 MB file which ppmonstr and paq8px cannot compress at all. I attached it as a 103 byte zpaq archive (inside a 348 byte zip archive) that you can extract with unzip and unzpaq. Is there any other compressor that can compress it?

    (Hint: I could have made the problem a lot harder than I did).
    Attached Files Attached Files
    • File Type: zip x.zip (348 Bytes, 152 views)

  6. #6
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    237
    Thanks
    39
    Thanked 92 Times in 48 Posts
    Quote Originally Posted by toffer View Post
    That's a common - but in my eyes really wrong approach
    Sorry, do you mean "wrong solution", about what I have suggested, to detect best models for each file, and to apply only them?

    Quote Originally Posted by Shelwien View Post
    It could be more interesting if you could find a real file like that
    Here's one example of a file on which you perform badly if you don't have an appropriate model, especially if compression time is also considered:
    http://www.squeezechart.com/bitmap.html -> 16 BIT (16 bits on greyscale channel)

    78.106.003 - original (uncompressed) image size
    ...
    66.959.861 bytes, 18 seconds -- 7-zip
    ...
    59.710.567 bytes, 3950 sec. -- PAQ8px
    ...
    52.323.004 bytes, 9 seconds -- libPGF, tightest compression, probably the world record on this file.

    I guess a few years from now the best model-mixing compressor will be able to detect 16-bit grayscale images even without standard headers (PGM, TIFF, etc) i.e. no a priori knowledge of image width.
    Last edited by Alexander Rhatushnyak; 24th July 2010 at 00:52.

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    @Alexander:
    Considering submodel disabling, my usual suggestion is to store submodel predictions for a data block without mixing them,
    and discard the submodels, by evaluating their "perfect mixing entropy" results - first compute entropy with w_i=1 for the
    most fitting submodel for each bit, discard the submodels with low counts, then discard submodels which don't give much
    improvement.
    Or, alternatively, its possible to run a usual parameter optimizer to tune the mixing parameters for a block, using
    the recorded submodel traces for speed, but imho that won't make much sense for datatype-specific submodels.

  8. #8
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Tonight, use my compressor to try.

  9. #9
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    237
    Thanks
    39
    Thanked 92 Times in 48 Posts
    Quote Originally Posted by Shelwien View Post
    Considering submodel disabling, my usual suggestion is
    Where was this discussed? Are there any implementations or publications?
    I mean enabling/disabling based on data analysis rather than standard headers analysis. For example, why not to detect if data are 8-bit, 16-bit or 32-bit, first of all? Before detecting if there are more than one channel as in multimedia files or a big enough record length as in database files.

    Quote Originally Posted by ddfox View Post
    Tonight, use my compressor to try.
    Can you please provide a link?
    Last edited by Alexander Rhatushnyak; 24th July 2010 at 19:58.

  10. #10
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Using my compressor can't compress Alexander's demodata.bin and Matt's x .
    But x.zpaq can be compressed to 101 bytes.

  11. #11
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Alexander Rhatushnyak View Post
    Can you please provide a link?
    Sorry, My compressor is in demo with Matlab, can't using it without Matlab. So I am shamed to show it on Matt's website.
    Try to transfer it to VC++.
    Last edited by ddfox; 25th July 2010 at 07:27.

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Here is a model that compresses x to 72 bytes using "zpaq niscrc4_no_key.cfg x.zpaq x". The options n, i, s tell zpaq not to store the filename, comment, or checksum. c says compress. The config file rc4_no_key.cfg is RC4 with no key schedule used as a PRNG to generate 1 MB of pseudo-random data.

    Code:
    (compresses 1 MB generated by RC4 with no key)
    comp 0 0 0 8 1 (pm=8: M is a 256 byte buffer S with wraparound)
      0 icm 5 (indirect order 0 model)
    hcomp
      halt
    pcomp clear.bat ;
        (rc4 b=i c=j r0=count)
        a= 100 a*= 100 a*= 100 r=a 0 (r0=1000000)
        do (empty key schedule M = S[i] = i, i = 0..255)
          *b=b b++
        a=b a> 255 until
        b=0
        do (1000000 times)
          b++ a=c a+=*b c=a (j += S[++i])
          a=*b *c<>a *b=a (swap S[i], S[j])
          a=b a+=c d=b b=a a=*b out b=d (output S[S[i]+S[j]])
        a=r 0 a-- a> 0 r=a 0 while (count down r0)
      halt
    end
    The preprocessor clear.bat converts any input to an empty file. The above config file will only work on this one particular file and no other.

    Code:
    copy nul: %2
    Compressing a file like this would be a much harder problem if I only gave you the file and I had implemented the key schedule and used a secret key. You could only compress it if you guessed the key or were able to crack RC4. Otherwise it looks completely random.

    As it was, I gave you the file already compressed. You could extract the RC4 code (and the key if I used one) from the archive with the command "zpaq tx x.zpaq". The t debugging option says to not postprocess. You would get a warning about a bad checksum (if one were stored) and the ZPAQL code for the PCOMP section would be appended to the beginning of the output. The COMP and HCOMP sections are stored uncompressed in the header of x.zpaq.

    I originally created the file x with the command "zpaq prrc4_no_key.cfg nul: x". "pr" means to run the PCOMP section as a program with input nul: and output x. The code is called once for each byte of input with the input byte in A, and once at EOF with 0xffffffff in A.

  13. #13
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Here is a model that compresses x to 72 bytes using "zpaq niscrc4_no_key.cfg x.zpaq x". The options n, i, s tell zpaq not to store the filename, comment, or checksum. c says compress. The config file rc4_no_key.cfg is RC4 with no key schedule used as a PRNG to generate 1 MB of pseudo-random data.

    Code:
    (compresses 1 MB generated by RC4 with no key)
    comp 0 0 0 8 1 (pm=8: M is a 256 byte buffer S with wraparound)
      0 icm 5 (indirect order 0 model)
    hcomp
      halt
    pcomp clear.bat ;
        (rc4 b=i c=j r0=count)
        a= 100 a*= 100 a*= 100 r=a 0 (r0=1000000)
        do (empty key schedule M = S[i] = i, i = 0..255)
          *b=b b++
        a=b a> 255 until
        b=0
        do (1000000 times)
          b++ a=c a+=*b c=a (j += S[++i])
          a=*b *c<>a *b=a (swap S[i], S[j])
          a=b a+=c d=b b=a a=*b out b=d (output S[S[i]+S[j]])
        a=r 0 a-- a> 0 r=a 0 while (count down r0)
      halt
    end
    The preprocessor clear.bat converts any input to an empty file. The above config file will only work on this one particular file and no other.

    Code:
    copy nul: %2
    Compressing a file like this would be a much harder problem if I only gave you the file and I had implemented the key schedule and used a secret key. You could only compress it if you guessed the key or were able to crack RC4. Otherwise it looks completely random.

    As it was, I gave you the file already compressed. You could extract the RC4 code (and the key if I used one) from the archive with the command "zpaq tx x.zpaq". The t debugging option says to not postprocess. You would get a warning about a bad checksum (if one were stored) and the ZPAQL code for the PCOMP section would be appended to the beginning of the output. The COMP and HCOMP sections are stored uncompressed in the header of x.zpaq.

    I originally created the file x with the command "zpaq prrc4_no_key.cfg nul: x". "pr" means to run the PCOMP section as a program with input nul: and output x. The code is called once for each byte of input with the input byte in A, and once at EOF with 0xffffffff in A.
    Matt, can you attach 72 bytes x.zpaq file?
    Last edited by ddfox; 26th July 2010 at 11:35.

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    72 byte x.zpaq attached.
    Attached Files Attached Files
    • File Type: zip x.zip (318 Bytes, 138 views)

  15. #15
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    72 byte x.zpaq attached.
    Thanks Matt.
    It can be compressed 2 bytes too. 72 bytes -->70 bytes

  16. #16
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    O(∩_∩)O哈哈~
    Last edited by ddfox; 5th August 2010 at 17:11.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •