Results 1 to 9 of 9

Thread: Compress small integers

  1. #1
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    198
    Thanks
    58
    Thanked 15 Times in 15 Posts

    Compress small integers

    What will be the best algorithm for compressing these integers? It would be possible to go below 46 bytes?

    Code:
    1111115151213111112411116211112152211111212131112251111412141623111143121222112212217131311113315111113221313141332111122141112123232224222321321511111323324121111624322212211421231212121211321121445121431235222
    Code:
    1
    1
    1
    1
    1
    1
    5
    1
    5
    1
    2
    1
    3
    1
    1
    1
    1
    1
    2
    4
    1
    1
    11
    6
    2
    1
    1
    1
    1
    2
    1
    5
    2
    2
    1
    1
    1
    1
    1
    2
    1
    2
    1
    3
    1
    1
    1
    2
    2
    5
    1
    1
    1
    1
    4
    1
    2
    1
    4
    1
    6
    2
    3
    1
    1
    1
    1
    4
    3
    1
    2
    1
    2
    2
    2
    1
    1
    2
    2
    1
    2
    2
    1
    7
    1
    3
    1
    3
    1
    1
    1
    1
    3
    3
    1
    5
    1
    1
    1
    1
    1
    3
    2
    2
    1
    3
    1
    3
    1
    4
    1
    3
    3
    2
    1
    1
    1
    1
    2
    2
    1
    4
    1
    1
    1
    2
    1
    2
    3
    2
    3
    2
    2
    2
    4
    2
    2
    2
    3
    2
    1
    3
    2
    1
    5
    1
    1
    1
    1
    1
    3
    2
    3
    3
    2
    4
    1
    2
    1
    1
    1
    16
    2
    4
    3
    2
    2
    2
    1
    2
    2
    1
    1
    4
    2
    1
    2
    3
    1
    2
    1
    2
    1
    2
    1
    2
    1
    1
    3
    2
    1
    1
    2
    1
    4
    4
    5
    1
    2
    1
    4
    3
    1
    2
    3
    5
    2
    2
    2
    Please hit the "THANKS" button under my post if its useful for you.

  2. #2
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    64
    Thanks
    10
    Thanked 20 Times in 15 Posts
    Entropy is 1.932334 bits per digit. 42.5B or 43B rounded up.

    Edit: I miscounted the digits on the mobile phone. Correction below.
    Last edited by Dresdenboy; 31st August 2020 at 04:46.

  3. #3
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    What is your alphabet?
    Your range of numbers in your sample seems to be 1..16. If you would give us different samples would that be always the case? Or would we face larger numbers sometimes? That's the alphabet of your input. And that's very important to know. (You sent us the numbers twice: I believe that the second one is correct and the first one is not, right? The second one contains the numbers 11 and 16 mixed-in with (inseparable from ) the other numbers)


    For compression you'll need a model.

    A model's role is to do any necessary rearranging and/or disassembly/assembly and/or prediction.
    A model may contain a dictionary of chunks that regularly appear in most inputs, a model may contain a way of representing repeating elements.
    A model may be an order-n string model. Or a model may simply contain an algorithm to use a static frequency table and just encode/decode input elements based on those frequencies.
    A complex model may be a combination of simple models. There may be for example different parts of your input that may be represented by different models or you may use context mixing to have a more fine-tuned probability estimation.

    In order to find out how much your numbers can be compressed, we'll need to find the answer to another question: which is the best model in your case?
    However you need to be careful: the model you find may do a good job in the above case of 209 numbers, but may fail miserably when giving it the next bunch of numbers. That's overfitting the model for a specific input data.
    So 209 numbers without specifying what these numbers represent is not enough. So from where did you get them?

    For creating a good model you need to tell us what these numbers represent.
    Those numbers you posted may give us some clue what they would mean, but that would be speculation from our side.
    Your numbers could represent residuals from another model (highly likely), or they may be part of an image or just random numbers (very-very unlikely but still a possibility).
    For different cases there may be different models that would work best. And different models may give different answers to your original question.

    It looks like that the probability of the above numbers are very close to: 50% 1s, 25% 2s, 12.5% 3s, ... 1/(2^n). Is this a true probability distribution? Without you telling us how you come up with those numbers, it's just speculation.
    If the above probability distribution is true and there is no other useful information (no structure for example) for the numbers, then the entropy (using an order0 model with these static probabilities) is 53.625 bytes.

    What model did you use to get 46 bytes?
    @Dresdenboy, what model did you use to get 42.5 bytes?

  4. #4
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    64
    Thanks
    10
    Thanked 20 Times in 15 Posts
    Under the assumption, that this is a homework, I just threw the string into an entropy function. But I didn't see the two 2-digit symbols at first. Are they real or typos?

  5. #5
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    >>But I didn't see the two 2-digit symbols at first. Are they real or typos?
    I believe the integers of 11 and 16 are mistakenly mixed in with the single-digit integers when removing whitespace.

    >>I just threw the string into an entropy function
    Aha, I see.
    This is an order0 model then, with the actual probabilities. For the mixed-in sample with 211 digits the entropy in bits is correct (1.932334) but in bytes it is 1.932334 * 211 / 8 = 50.96531 bytes.

    >>Under the assumption, that this is a homework
    CompressMaster has a long history of asking similar questions:

    https://encode.su/threads/3024-Best-...ental-integers
    https://encode.su/threads/2990-Files...nt-compression
    https://encode.su/threads/2988-Integer-compression
    https://encode.su/threads/2984-Compr...us-data-stream
    https://encode.su/threads/2965-I-am-...ithm-available

    I hope he learns from them.

  6. #6
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by CompressMaster View Post
    It would be possible to go below 46 bytes?
    Probably not, test with 209 values:
    61 bytes, test.bin.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
    78 bytes, test.bin.paq8px193
    84 bytes, test.txt.paq8px193
    104 bytes, test.ge.paq8px193
    112 bytes, test.ge (GearEnc https://encode.su/threads/545-GearEn...-gear-encoding)
    115 bytes, test.txt.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
    130.625 bytes, binary 5 bits
    138 bytes, test.bin.paq8pxd89
    159 bytes, test.ge.paq8pxd89
    167 bytes, test.txt.paq8pxd89
    209 bytes, test.bin (binary 8 bits)
    419 bytes, test.txt (text x,x,x)
    Last edited by Sportman; 21st August 2020 at 23:22.

  7. #7
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    64
    Thanks
    10
    Thanked 20 Times in 15 Posts
    Is this without file headers? Otherwise I'd guess the models don"t adapt to the reduced alphabet that quickly.

    Oh and I counted 4*37+28 digits in the webform I used (176 digits), hence the lower value. I'll test that on a real PC later.
    Update: Indeed, this are 211 digits. My bad. So Gotty is right with ~51 Bytes with an order-0 model.

    But the visible repetitions might help getting this lower thanks to some redundancy.

  8. #8
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by Dresdenboy View Post
    Is this without file headers?
    No.

    Quote Originally Posted by Dresdenboy View Post
    Indeed, this are 211 digits.
    Same test with 211 values (11 and 16 split):
    59 bytes, test.bin.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
    75 bytes, test.bin.paq8px193
    80 bytes, test.txt.paq8px193
    91 bytes, test.ge.paq8px193
    96 bytes, test.ge (GearEnc https://encode.su/threads/545-GearEn...-gear-encoding)
    105.5 bytes, binary 4 bits
    116 bytes, test.txt.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
    136 bytes, test.bin.paq8pxd89
    148 bytes, test.ge.paq8pxd89
    162 bytes, test.txt.paq8pxd89
    211 bytes, test.bin (binary 8 bits)
    421 bytes, test.txt (text x,x,x)
    Last edited by Sportman; 21st August 2020 at 23:27.

  9. #9
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    gamma coding gives 59bytes(477bits).
    Code:
    !function(A){ for(var a=0,b=A.length,c,d,e=0;a<b;){
      for(d=A[a++],c=0;d>>++c;);
      e+=c*2-1
     }document.write(e," ",e>>3)
    }([1,1,1,1,1,1,5,1,5,1,2,1,3,1,1,1,1,1,2,4,1,1,1,1,6,2,1,1,1,1,2,1,5,2,2,1,1,1,1,1,2,1,2,1,3,1,1,1,2,2,5,1,1,1,1,4,1,2,1,4,1,6,2,3,1,1,1,1,4,3,1,2,1,2,2,2,1,1,2,2,1,2,2,1,7,1,3,1,3,1,1,1,1,3,3,1,5,1,1,1,1,1,3,2,2,1,3,1,3,1,4,1,3,3,2,1,1,1,1,2,2,1,4,1,1,1,2,1,2,3,2,3,2,2,2,4,2,2,2,3,2,1,3,2,1,5,1,1,1,1,1,3,2,3,3,2,4,1,2,1,1,1,1,6,2,4,3,2,2,2,1,2,2,1,1,4,2,1,2,3,1,2,1,2,1,2,1,2,1,1,3,2,1,1,2,1,4,4,5,1,2,1,4,3,1,2,3,5,2,2,2])

Similar Threads

  1. Best compression algorithm for a sequence of incremental integers
    By CompressMaster in forum Data Compression
    Replies: 18
    Last Post: 17th May 2019, 11:56
  2. Compressing small packets
    By cbloom in forum Data Compression
    Replies: 4
    Last Post: 1st March 2013, 01:59
  3. Hashing a large range of integers to a smaller range
    By mtwaha in forum Data Compression
    Replies: 3
    Last Post: 11th April 2011, 23:27
  4. compressing a really small 1k .COM file
    By Rugxulo in forum Data Compression
    Replies: 3
    Last Post: 28th November 2009, 00:32
  5. A Small Warning
    By encode in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 30th August 2008, 21:05

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •