Results 1 to 9 of 9

Thread: Competition on compression ($)

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    Abu
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Thumbs up Competition on compression ($)


  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Calgary challenge and Hutter prize are still active as far as I know. It has been a long time since any improvements were posted, I guess because it would be a lot of work for a small prize. I am on the Hutter prize committee so I am not competing in it. After the last award, the minimum requirement is a 3% improvement but I doubt it can be met, especially with the 1 GB memory limit.

    I considered working on submitting a zpaq based entry to the Calgary challenge, but more pressing work always came up.

    Sequencesqueeze deadline has passed. I participated (didn't win) and thought it was well organized.

    Ocarina prize - we dropped the ball. Sorry about that.

    Didn't know about the Russian contests.

    There are still plenty of benchmarks, but no prize other than fame.

  3. #3
    Member
    Join Date
    Jul 2013
    Location
    Abu
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thanks Matt.
    What else competitions on compression?
    I heard about compression by rand of data, but I couldn't find.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    http://marknelson.us/2012/10/09/the-...nge-turns-ten/

    I actually found some redundancy in this file. In its original form of 1 million decimal digits in a table with 20,000 rows and 50 columns, if you add up the columns, each of the 50 sums is an even number. But your program would have to be smaller than 50 bits to win.

  5. #5
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I am not sure the computer is specified but your could define a computer and program for it that would win if the data handles like you say I think Mark would at least give you some recogniztion for your contribution.

    Just read in each the file determine each of the 20,00o rows separately there are 19,999 rows then the special row of 50 digits
    in each case the are ten possible symbols however since sum is even for each column since each case there are 5 digits only
    that can make it even. 0 2 4 6 8 if column already even and 1 3 4 5 9 if odd before the values in last row added.
    well of even before last added in use 0 1 2 3 4 for 0 2 4 6 8 respectively and if odd use 0 1 2 3 4 for 1 3 4 5 9 respectively
    then last 50 digts are actually 50 digits base 5 number instead of a base 10 you could then rewrite the file as two binary
    numbers. Or you can save a byte and combine the two integer numbers bijectively and write a single number.

    However you need custom made computer that does this on its own so virtually not program space is used.

  6. #6
    Member
    Join Date
    Jun 2010
    Location
    CN
    Posts
    5
    Thanks
    1
    Thanked 1 Time in 1 Post

  7. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    http://marknelson.us/2012/10/09/the-...nge-turns-ten/

    I actually found some redundancy in this file. In its original form of 1 million decimal digits in a table with 20,000 rows and 50 columns, if you add up the columns, each of the 50 sums is an even number. But your program would have to be smaller than 50 bits to win.
    Thinking about things like the birthday paradox, it seems plausible that there are compressible patterns in sets of random numbers with high probability. Consider it in the negative: What is the probability that no compressible pattern occurs in that data? Maybe the challenge is just finding them.

  8. #8
    Member
    Join Date
    Jul 2013
    Location
    Abu
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    http://marknelson.us/2012/10/09/the-...nge-turns-ten/

    I actually found some redundancy in this file. In its original form of 1 million decimal digits in a table with 20,000 rows and 50 columns, if you add up the columns, each of the 50 sums is an even number. But your program would have to be smaller than 50 bits to win.
    Matt's thanks, interesting competition, probably we will be able to win.

    Quote Originally Posted by est View Post
    This benchmark, without a money prize =)

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Sure there are compressible patterns in random data, but specifying their locations will cost the space you saved.

    There are potentially other biases in the million random digits file that I discussed years ago in comp.compression. The data was originally generated by sampling a 5 bit counter driven by a noisy oscillator to produce a set of 20,000 punched cards with 50 digits each. But there was some correlation between consecutive digits, so what they did was add adjacent pairs of cards modulo 10 to produce a new set of cards which was published. That is why the sums of the columns are even. Each of the original cards is counted twice.

    If the new cards were kept in order, then it would be possible to recover the original cards and use the bias to compress the data. To test this, I tried alternately adding and subtracting cards to see if the columns added to 0, but they did not. That means the new set of cards was mixed up or shuffled. I did find very weak correlations between adjacent cards and pairs separated by 1 card in between (but not 2 or more). This suggests the cards were shuffled, although this fact is not mentioned in the RAND document. If you could unshuffle them, then I think it might be possible to compress the data, provided you could express the permutation in less space than the bias. As it is, the very weak correlation would only save a few bits over the whole set.

Similar Threads

  1. Compression Competition -- $15,000 USD
    By Fixee in forum Data Compression
    Replies: 153
    Last Post: 27th March 2013, 15:38

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •