Results 1 to 6 of 6

Thread: Lossy compression of random numbers

  1. #1
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    242
    Thanks
    112
    Thanked 114 Times in 69 Posts

    Lossy compression of random numbers

    Problem: you are given N random numbers (independent and uniformly distributed) in a given range. Using up to M bits of data, compress the numbers in a way that minimizes mean-squared error (MSE). I am interested in the case where M is much lower than N.

    Any ideas how to approach this problem? Any existing research on this topic?

    My current approach: search over seeds of a random number generator and store the seeds using up to M bits. If M is tiny (let's say 4 bytes), you can search over a single seed, generate N random numbers using that seed, and store the seed which minimizes MSE. If M is larger, you can divide the N numbers into different groups and use one seed per group (dividing the search space would make it easier to find lower MSE).

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,774
    Thanks
    275
    Thanked 1,205 Times in 671 Posts
    Maybe something like this: https://ieeexplore.ieee.org/abstract/document/5205695 ?

    1. PRNG approach is only optimal if you have a high chance of guessing the right PRNG and seed.
    Otherwise its better to approximate a chunk of random numbers with a single constant.

    Presuming that you actually need random numbers in the output, maybe you can
    find the constant C, such that Sum (x[i]-(prng[i]+C))^2 is minimal... it should be faster
    than testing all seed values.

    2. For MSE higher bits of numbers actually have higher value.
    So maybe it would make sense to in fact try compressing them (lossily if necessary).
    That would give you 50%+k bits guessed correctly if you'd manage to encode positions of k 1's in a block.

  3. Thanks:

    byronknoll (16th January 2020)

  4. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    747
    Thanks
    233
    Thanked 235 Times in 145 Posts
    If I properly understood, you start with uniform distribution on [0,1]^N hypercube, and would like to restrict it using M bits to minimize MSE, especially for M << N ?
    I assume order on these N numbers is relevant, otherwise take pyramid for sorted instead of hypercube.

    Focusing on uniform distributions, M bits allow to restrict volume 2^M times.
    Using this restriction uniformly for all variables, theoretically it should allow to reduce each edge of this hypercube to 2^(-M/N).

    To realize it, a basic approach is codebook - spread e.g. randomly 2^M points and choose one of them as approximation. But I am afraid it is far from theoretical possibility.
    Entropy coders should allow to approach theoretical threshold, e.g. by splitting of variables into (epsilon, 1-epsilon) distributions, which choice require less than one bit.

  5. Thanks:

    byronknoll (16th January 2020)

  6. #4
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    242
    Thanks
    112
    Thanked 114 Times in 69 Posts
    I wrote a quick benchmark to compare some approaches: https://gist.github.com/byronknoll/3...1a37ac700c0618

    My original idea of sampling uniform random numbers did not perform well. The best performing technique so far uses numbers sampled from a Bates distribution.

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,774
    Thanks
    275
    Thanked 1,205 Times in 671 Posts
    I tried increasing the number of iterations for method#2 to 10M, it improved, while #4 got worse because of different seed...
    Code:
    >a.exe
    method 1: 0.06072729
    method 2: 0.08386047
    method 3: 0.08643947
    method 4: 0.05860876
    method 5: 0.06026703
    
    >a.exe
    method 1: 0.06072729
    method 2: 0.07329438
    method 3: 0.08221427
    method 4: 0.05916388
    method 5: 0.06026703

  8. #6
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    268
    Thanks
    111
    Thanked 153 Times in 112 Posts
    I don't know if this helps.
    Code:
    Method 5, distribution(m, d).
    m    d    x10000     x10000000
    0.50 0.09 0.05216995 0.04713555
    0.50 0.12 0.05303528 0.04617647
    0.53 0.09 0.05139838 0.04581179
    0.54 0.12 0.05273559 0.04489511

Similar Threads

  1. random compression FAQ
    By Shelwien in forum Random Compression
    Replies: 7
    Last Post: 23rd February 2020, 04:39
  2. Random compression
    By Shelwien in forum Random Compression
    Replies: 6
    Last Post: 29th September 2019, 07:51
  3. Numbers vs text compression
    By irect in forum Data Compression
    Replies: 3
    Last Post: 7th March 2016, 01:24
  4. Random numbers
    By nburns in forum The Off-Topic Lounge
    Replies: 9
    Last Post: 9th September 2013, 05:53
  5. Random reads vs random writes
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 22
    Last Post: 16th May 2011, 09:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •