Results 1 to 10 of 10

Thread: Next step in Data Compression

  1. #1
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts

    Next step in Data Compression

    Hi,

    I dont know if this was discussed before but if i missed it. =/

    I dont know if this really would works, but I think it should. I see no reason why this shouldnt work, except the need of very clever algoritmns and much cpu performance.

    How it works:

    1. The file which needs to be compressed is scanned.
    2. Some Optimizer chooses one or more algorithmns to discribe contents of the file, For instance by recognizing pattern which are going through the file, here it could be evoluationary Algorithmns used to optimize the pattern/model searching.
    3. these Models and the configuration Parameter are saved as compressed file.

    Example:

    If we want to compress a Video of some 64k Demo. We dont compress the picture and audio data. But we try to find the code which generates the Video.

    So the way isnt the usual try to predict the next byte/bit/etc. But try to create a math. series or an algorithmn which describs a part or the whole file itself.


    What do you think can this work?

  2. #2
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by thometal View Post
    Hi,

    I dont know if this was discussed before but if i missed it. =/

    I dont know if this really would works, but I think it should. I see no reason why this shouldnt work, except the need of very clever algoritmns and much cpu performance.

    How it works:

    1. The file which needs to be compressed is scanned.
    2. Some Optimizer chooses one or more algorithmns to discribe contents of the file, For instance by recognizing pattern which are going through the file, here it could be evoluationary Algorithmns used to optimize the pattern/model searching.
    3. these Models and the configuration Parameter are saved as compressed file.

    Example:

    If we want to compress a Video of some 64k Demo. We dont compress the picture and audio data. But we try to find the code which generates the Video.

    So the way isnt the usual try to predict the next byte/bit/etc. But try to create a math. series or an algorithmn which describs a part or the whole file itself.


    What do you think can this work?
    A data/pattern detector?

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    In theory it is not computable (proved by Kolmogorov). In practice it is just very hard. Suppose you want to compress some encrypted text. To guess a program that generates the data, you have to guess the key.

    Of course if it worked, it would give you the absolutely best compression. Here is 1 million digits of pi compressed to 114 bytes as a zpaq archive. No other compressor can do better than 415241 bytes. It may take a day or two to decompress. http://mattmahoney.net/dc/pi.txt.zpaq

  4. #4
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Matt Mahoney View Post
    In theory it is not computable (proved by Kolmogorov). In practice it is just very hard. Suppose you want to compress some encrypted text. To guess a program that generates the data, you have to guess the key.

    Of course if it worked, it would give you the absolutely best compression. Here is 1 million digits of pi compressed to 114 bytes as a zpaq archive. No other compressor can do better than 415241 bytes. It may take a day or two to decompress. http://mattmahoney.net/dc/pi.txt.zpaq
    It's not just hard.Precomp already detects zip filetypes.Someone can extend precomp for other file types (like wav). The data detection pattern is a bit different.The data detector need to know which compressor you will use. Also the data detector can transform the data according which compressor will be used.

    I don t know if you really need to guess the exact key.Maybe you can try some techniques which approximates the value of key .
    Last edited by lunaris; 8th August 2014 at 06:16.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    If you guess the key wrong by 1 bit, then the ciphertext still looks random. If it didn't, that would be a weakness in the encryption algorithm because you would have a key search algorithm faster than brute force.

  6. #6
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts
    I think its not necessary to detect the cypher alogorithmn and cypher it self. Because it would be possible to just discover some mathmatical formulas which generates the bits/bytes of the file. So if you apply these formulas one after other until the final bit/byte series of the file generated. It would be possible compress the files by save these formulas.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Assuming your input string was n bits, you would have to test 2^n possible programs to see if it is a valid program and outputs your string. If you guessed that it was encrypted with a key of less than n bits, then a brute force key search would be faster.

  8. #8
    Member
    Join Date
    Jul 2014
    Location
    Kenya
    Posts
    59
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by thometal View Post
    Hi,

    I dont know if this was discussed before but if i missed it. =/

    I dont know if this really would works, but I think it should. I see no reason why this shouldnt work, except the need of very clever algoritmns and much cpu performance.

    How it works:

    1. The file which needs to be compressed is scanned.
    2. Some Optimizer chooses one or more algorithmns to discribe contents of the file, For instance by recognizing pattern which are going through the file, here it could be evoluationary Algorithmns used to optimize the pattern/model searching.
    3. these Models and the configuration Parameter are saved as compressed file.

    Example:

    If we want to compress a Video of some 64k Demo. We dont compress the picture and audio data. But we try to find the code which generates the Video.

    So the way isnt the usual try to predict the next byte/bit/etc. But try to create a math. series or an algorithmn which describs a part or the whole file itself.


    What do you think can this work?

    http://encode.su/threads/2006-Hello-...e-an-algorithm
    Here is an approach using strings and gaps where prevalence is available. There's some tweaks that can be done and more can be added as well like methods such as nCr for capturing prevalence in its smallest form where advance scans with concurrency of methods allows.

    I've mentioned this in some posts and about a compressible/uncompressible spectrum of data in regards to prevalence and some methods can be more effective than others from the arrangement of the 0/1.


    In regards to the 64K video, I have thought of the notion of describing the contents as though you have the original assets and the scheduler to play it.
    Instead of looking at strings and common repetition in frames e.g. you end up having some kind of info to describe 'assets' within the video and then a scheduler (like flash even).
    I mentioned a size/space balance when looking at prevalence as the enabler and when something is smaller it will be longer to decode (more steps).
    With the algorithm mentioned in the thread it should be able to stream very compact code and even in chunks, and where you put something like checking patterns of every '10 bits is x' type thing you end up with something that requires background layer calculating and more cpu going than being able to stream.

    If you construct something like assets from the things which end up moving more compared to what is static, and compress those assets and create a scheduler with some set of animation frames/movement info to resemble a scheduler, it is doable in a way without very extreme computation. It wouldn't necessarily save a lot of space and would be more intensive to decode (an example is MID/FL Studio type info where the scheduler/effects and samples end up generating the WAV equivalent on-the-fly and is very intensive to be used in media, and this would be quite similar and not just for audio but image and may require relatively high specs to be streamable). Also keep in mind that with scheduling like this based on its makeup it is possible the final form (like the WAV in the example) can compress to something smaller than the original scheduler form from headering placement compared to tighter variable prefixes for example.


    With the added tweak of checking patterns like every '10 bits is x' type thing mentioned, as well as the other things, I don't think the result would be too far off (including some kind of compression of the assets themselves in terms of a minimal storage of their makeup strings).
    Last edited by SoraK05; 9th August 2014 at 00:34.

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    OK, here is something to test your idea. It is a 1 MB pseudo-random file, which I saved in a 102 byte zpaq archive (inside a zip archive, since I can't attach zpaq archives). The file will not compress with zip, 7zip, rar, nanozip, zpaq -method 5, ppmonstr, or paq8pxd_v12. I am pretty sure there is no other compressor that can compress it either without a custom algorithm like the one I used to create the attached archive. I created the file like this:

    Code:
      zpaqd r rc4 (8 byte key) p nul: x
    where (8 byte key) is 8 decimal numbers in the range 0..255 separated by spaces in the command line. If you want to create it yourself instead of extracting it from the attached archive, then you will need zpaqd from http://mattmahoney.net/dc/zpaqutil.html and the file rc4.cfg:
    Code:
    (compresses 1 MB generated by RC4 with 64 bit key in $1..$8)
    comp 0 0 3 8 1
      0 icm 5
    hcomp
      halt
    pcomp clear.bat ; (copy nul: %2)
        (RC4: b=i c=j r0=count m=S[0..255] h=key[0..7])
        a= 100 a*= 100 a*= 100 r=a 0 (r0=1000000)
        *d= $1 d++ *d= $2 d++ *d= $3 d++ *d= $4 d++ (save key in H)
        *d= $5 d++ *d= $6 d++ *d= $7 d++ *d= $8 d=0
        do (init S[i] = i)
          *b=b
        b++ a=b a> 255 until
        b=0 do (key schedule)
          a=c a+=*b a+=*d c=a d++
          a=*b *c<>a *b=a
        b++ a=b a> 255 until
        do (output keystream)
          b++ a=c a+=*b c=a
          a=*b *c<>a *b=a
          a+=*c d=b b=a a=*b out b=d
        a=r 0 a-- a> 0 r=a 0 while
      halt
    end
    and the file clear.bat:
    Code:
    copy nul: %2
    Once you have created the file, you can compress it like this:
    Code:
    zpaqd cinst rc4 (8 byte key) x.zpaq x
    The RC4 key has to be embedded in the archive, of course. You can easily find it using the command "zpaqd l x" to show the 8 constants used to initialize the ZPAQL H array. But good luck finding the key if all you have is the file.
    Attached Files Attached Files

  10. #10
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Random files or chaotic files are not the files most used by users.There are a lot of files compressible and with well defined patterns.

Similar Threads

  1. loseless data compression method for all digital data type
    By rarkyan in forum Random Compression
    Replies: 244
    Last Post: 23rd March 2020, 16:33
  2. Data Compression PC
    By encode in forum The Off-Topic Lounge
    Replies: 206
    Last Post: 13th December 2019, 23:10
  3. Any money in data compression?
    By bitewing in forum The Off-Topic Lounge
    Replies: 18
    Last Post: 19th March 2019, 10:34
  4. Future-LZ as last-step compression scheme
    By Piotr Tarsa in forum Data Compression
    Replies: 18
    Last Post: 3rd December 2011, 00:55
  5. Advice in data compression
    By Chuckie in forum Data Compression
    Replies: 29
    Last Post: 26th March 2010, 15:09

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •