# Thread: Lossy compression of random numbers

1. ## Lossy compression of random numbers

Problem: you are given N random numbers (independent and uniformly distributed) in a given range. Using up to M bits of data, compress the numbers in a way that minimizes mean-squared error (MSE). I am interested in the case where M is much lower than N.

Any ideas how to approach this problem? Any existing research on this topic?

My current approach: search over seeds of a random number generator and store the seeds using up to M bits. If M is tiny (let's say 4 bytes), you can search over a single seed, generate N random numbers using that seed, and store the seed which minimizes MSE. If M is larger, you can divide the N numbers into different groups and use one seed per group (dividing the search space would make it easier to find lower MSE).

2. Maybe something like this: https://ieeexplore.ieee.org/abstract/document/5205695 ?

1. PRNG approach is only optimal if you have a high chance of guessing the right PRNG and seed.
Otherwise its better to approximate a chunk of random numbers with a single constant.

Presuming that you actually need random numbers in the output, maybe you can
find the constant C, such that Sum (x[i]-(prng[i]+C))^2 is minimal... it should be faster
than testing all seed values.

2. For MSE higher bits of numbers actually have higher value.
So maybe it would make sense to in fact try compressing them (lossily if necessary).
That would give you 50%+k bits guessed correctly if you'd manage to encode positions of k 1's in a block.

3. ## Thanks:

byronknoll (16th January 2020)

4. If I properly understood, you start with uniform distribution on [0,1]^N hypercube, and would like to restrict it using M bits to minimize MSE, especially for M << N ?
I assume order on these N numbers is relevant, otherwise take pyramid for sorted instead of hypercube.

Focusing on uniform distributions, M bits allow to restrict volume 2^M times.
Using this restriction uniformly for all variables, theoretically it should allow to reduce each edge of this hypercube to 2^(-M/N).

To realize it, a basic approach is codebook - spread e.g. randomly 2^M points and choose one of them as approximation. But I am afraid it is far from theoretical possibility.
Entropy coders should allow to approach theoretical threshold, e.g. by splitting of variables into (epsilon, 1-epsilon) distributions, which choice require less than one bit.

5. ## Thanks:

byronknoll (16th January 2020)

6. I wrote a quick benchmark to compare some approaches: https://gist.github.com/byronknoll/3...1a37ac700c0618

My original idea of sampling uniform random numbers did not perform well. The best performing technique so far uses numbers sampled from a Bates distribution.

7. I tried increasing the number of iterations for method#2 to 10M, it improved, while #4 got worse because of different seed...
Code:
```>a.exe
method 1: 0.06072729
method 2: 0.08386047
method 3: 0.08643947
method 4: 0.05860876
method 5: 0.06026703

>a.exe
method 1: 0.06072729
method 2: 0.07329438
method 3: 0.08221427
method 4: 0.05916388
method 5: 0.06026703```

8. I don't know if this helps.
Code:
```Method 5, distribution(m, d).
m    d    x10000     x10000000
0.50 0.09 0.05216995 0.04713555
0.50 0.12 0.05303528 0.04617647
0.53 0.09 0.05139838 0.04581179
0.54 0.12 0.05273559 0.04489511```

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•