1. ## Sensor Data Compression

Dear

let assume we have a sensor filed with dimension of M*M. In order to apply any data compression technique, first I want to know what is the compression limit or minimum entropy of the entire sensor field. How could I compute the minimum entropy or compression limit for the sensor filed?

2. 1. Compress it with a strong CM like paq8 - http://paq8.hys.cz/
2. Depending on your data type, it may be good to preprocess the values to make it look like some popular data type,
eg. a picture in your case
3. Also here's a coder for floating-point data - http://www.csl.cornell.edu/~burtscher/research/FPC/
paq8 doesn't handle floats very well.

3. As Shelwien mentioned implicitly, there is no direct measurement. So, you have to test with one of strongest compressor (i.e. paq8 as Shelwien suggested). I assume, your sensors are not CCD array. If so, you have to apply some preprocessing to make more compressible. Because, most compressors work with bytes (8 bits) while sensor outputs are usually 10 to 12 bits wide. And not to mentioned about noise which is always active.

If you want to quick response about your question, it's better to share some sample data among useful information about data structure. Data source characteristic is also very important.

4. Thanks for your quick response.

Actually I want to have the theoretical compression limit. Let put the problem for an Image. I want to know are there any mathematical method exist to calculate the theoretical compression limit. Please let me know or suggest me any readings to formulate the problem.

Thanks

5. Sure there're some theoretical results, but without a formal model of your data we can only apply something simple,
like memoryless models for specific probability distribution types (and even that may be wrong if you work with floats).
Also its very likely that any ad hoc estimation would be off by an order of magnitude.
And I don't see why paq8 can't be considered a "theoretical estimator" - sure its based on an iterative formula, but so what?

Btw, just for fun, you can ask your question there - http://stats.stackexchange.com/

6. What's maximum compression limit? A BARF like solution ( http://cs.fit.edu/~mmahoney/compression/barf.html )? Or Kolmogorov complexity?

7. Entropy calculations for fully speciﬁed data have been used to get a theoretical bound on how much that data can be compressed. My specified filed is the data from sensor filed. And I want to calculate the minimum entropy (assuming lossless compression) for entire sensor field not for individual nodes.

Originally Posted by Piotr Tarsa
What's maximum compression limit? A BARF like solution ( http://cs.fit.edu/~mmahoney/compression/barf.html )? Or Kolmogorov complexity?

8. Its really easy to calculate that entropy.
Its Sum[ -log2(bit[i]*p[i]+(1-bit[i])*(1-p[i])), i=0..N-1 ] ),
where bit[i] are bits of your data, N is the number of bits, and
p[i] are probabilities of bit[i]==1.
But p[i] values are defined by _the_model_ and we don't know anything about your data.

9. If the input data to be compressed is very large (eg several terabytes) then size of decompressor attached to compressed data is neglectable. With such assumption, computing minimum compressed size is about equal to Kolmogorov complexity, which is practically uncomputable. Compressed data is our programming language and decompressor is it's interpreter.

If you have the probability model then you can apply equation Shelwien provided and get the result. Or maybe not. Shelwien didn't include contexts into his equation.

The problem of finding compression limit should be about equally difficult as writing a compressor that achieves that. I am right?

10. > Shelwien didn't include contexts into his equation.

I actually did - note the "p[i] are probabilities of bit[i]==1",
ie each bit has its own probability estimation, and they can be computed using contexts or whatever.

> The problem of finding compression limit should be about equally difficult as writing a compressor that achieves that. I am right?

That depends on approach - for example, its easier to compute (n0+n1)!/n0!/n1! than actually encode such a bit permutation.
Also some approximations can be applied for redundancy measurement, but not for actual (decodable) compression.
But if we'd use a bitwise statistical approach, like what i described, making an actual coder would be certainly easier, as
there're reasonably good open-source examples of that.
Also its a good idea to write a compressor first anyway, because otherwise (without decoding tests) its really
easy to make a subtle mistake (eg. use some "future" information as context somewhere), which would make the estimation
completely wrong.
While when a working compressor already exists, it should be easy enough to write an approximated (and simplified) formula
which would produce similar results.

11. The theoretical limit of compression is Kolmogorov complexity, which is not computable. If you know the probability distribution, then the theoretical limit is the entropy given by Shannon for which there are easy solutions like arithmetic coding. However, the probability distribution is not computable in general. To get the best practical compression, you need to know what the data means, because prediction is the same as understanding. See chapter 1 of http://mattmahoney.net/dc/dce.html

12. ## Data compressions

There are not so much options for doing this kind of execution. For this type of work it should be compressed through CM, and its all depending on what kind of data you have with you. Many sites have the same kind of tutorials and you may Google it.

Thats the no more than thai ed visa point to do if we famine to start in on our
improvement.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•