Results 1 to 13 of 13

Thread: New LTCB record

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts

    New LTCB record

    Dmitry Shkarin broke his own record on LTCB, improving compression from .1280 to .1277 with a new dictionary for durilca'kingsize. It is also faster than the next 13 compressors. I did not test it myself because it requires 64 bit Windows and 13 GB memory.

    http://mattmahoney.net/dc/text.html#1277

  2. #2
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Congratulatation Dmitry!
    Nice work!

  3. #3
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It is designed to work only on this benchmark and not in general.
    No kidding??!?!?!?!?!?!


  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    The top 3 on LTCB are all tuned to the benchmark. That's going to happen with any public benchmark and I expected it. There are 3 ways (that I know of) to benchmark, but none of them really answer the question "what is the best compressor?" Otherwise zip would be top ranked because that's what everyone uses.

    1. Use private test data.
    2. Use public data and add the decompresser size.
    3. Use a cryptographic random data generator (like generic benchmark).

    I don't like 1 because nobody else can submit or check results. It is more work for the evaluator, but maybe you can automate it like sportsman's metacompressor site. Even so, when the evaluator gets tired of it, the benchmark dies.

    2 has problems like tuning, but it does eliminate tricks like BARF that compress the Calgary corpus to 1 byte, or less blatant like dictionary from world95.txt, special filter for rafale.bmp, etc. You can't compress smaller than the (unknown) Kolmogorov complexity, and it it well tested (Calgary challenge).

    3 solves the problems of (1) without decompresser size, but only gives an approximate measurement. But creating truly generic data is a hard theoretical problem. The distribution is very sensitive to the choice of programming language for the random programs that generate the test data.

  5. #5
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The top 3 on LTCB are all tuned to the benchmark. That's going to happen with any public benchmark and I expected it.
    Sorry, I think you missed the joke. I thought it was extremely obvious and redundant that a 64-bit 13 GB compressor wasn't general use.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Yeah I saw the smilies. I'm waiting for someone to run a 10,000 model version of PAQ on a Jaguar XT5 with 224,000 cores and 299 TB of memory. http://www.nccs.gov/jaguar/

  7. #7
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Rugxulo View Post
    Sorry, I think you missed the joke. I thought it was extremely obvious and redundant that a 64-bit 13 GB compressor wasn't general use.
    Actually, I joke, but now I've seen the Black Friday ads with several computers sporting anywhere from 1 GB RAM (netbooks) to 3 GB laptops to one desktop with 8 GB, and actually most of them run 64-bit Win7, even with only 3 GB of RAM. So I guess it's not that far-fetched (if still extremely silly to my mind).

Similar Threads

  1. New LTCB record
    By Matt Mahoney in forum Data Compression
    Replies: 20
    Last Post: 13th August 2009, 00:06
  2. New LTCB champion
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 23rd May 2008, 17:33
  3. ASH04a and LTCB
    By Shelwien in forum Forum Archive
    Replies: 10
    Last Post: 28th February 2008, 14:04

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •