Page 2 of 2 FirstFirst 12
Results 31 to 53 of 53

Thread: Recursive data compression patent for sale

  1. #31
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    the best compression we can achieve for any string x is to encode it as the shortest program M in some language L that outputs x.
    what about using a e natural logarithm as M on small portions of data giving the X value, i dont see how it would matter what M= so long as it gives x(X being a portion of the string) then taking the next 5 values and using an e logarithm and so on. Would this not end in the total value of X ?
    I have been studying Matt :P love reading your book
    ps my friend was telling me that e natural logarithm's where being used before computers where invented "P

    Never mind i just read this article http://www.faqs.org/faqs/compression...section-8.html on random data compression and realize it cant be done
    And there i was thinking of a random brute force algorithm that worked by splitting up the x string into 6 digit decimal chunks and randomly discovering the right formula, then moving on to the next 6 digits. I still find it hard to believe its impossible after reading Harking's theory on possibility. But i must admit it would be hard to get 16 pigeons into 15 holes.
    Last edited by Omnikam; 8th October 2010 at 10:23.

  2. #32
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    After playing around with Barf2 i decided to try my own experiments
    s=seed
    Consider X is a known decimal of any finite length and in any order
    Take x ,if its last digit is not 0 then add the difference to reach 0
    Then divide by 10 and repeat until you reach the number that if you
    added the difference and divided by ten you would end in 1.
    In this example that number is 6. So 6 then becomes the seed for extraction
    record each process in a text file. Then simply revere the process to get X again
    look at the following example
    The left row displays the recorded possess from x to s
    The right row displays s to x

    x=522223689578236863655
    +5 s=6
    /10 *10
    +4 -7
    /10 *10
    +3 -7
    /10 *10
    +6 -7
    /10 *10
    +3 -3
    /10 *10
    +1 -6
    /10 *10
    +3 -3
    /10 *10
    +6 -1
    /10 *10
    +7 *10
    /10 -4
    +1 *10
    /10 -2
    +2 *10
    /10 -1
    +4 *10
    /10 -7
    /10 *10
    +1 -6
    /10 *10
    +3 -3
    /10 *10
    +6 -1
    /10 *10
    +7 -3
    /10 *10
    +7 -6
    /10 *10
    +7 -3
    /10 *10
    +7 -4
    /10 *10
    S=6 -5
    x=522223689578236863655
    lets now substitute "*10" = *
    starting with seed=6 we have
    6*7*7*7*3*6*3*1**4*2*1*7*6*3*1*3*6*3*4*5=x
    Even better since after the seed we know it follows the pattern multiply then add and no number is greater than 9
    we may leave out the * sign unless Multiply is done in succession
    so now we have
    67773631*421763136345=x
    compare that to the original length of x
    522223689578236863655
    Still not looking good
    What about assigning a letter instead
    1=A 2=b 3=c 4=d 5=e 6=f 7=g 8=h 9=i 0=j *=U
    fgggcfcaudbagfcacfcde=x
    no better
    But we could use more common letters to or 10 numbers + * in the hopes we might find words
    Then it's on to the next step, but wait lets think, Random data compression isnt possible
    so why go on
    Ive found this all to be a funny interesting waste of time
    Not to be taken seriously Matt :wink:
    ps What ive learned from this is, The equation will never be smaller than the original x

    why such an easy one? Oh now i see why! ouch :P but ill post a solution even if its less efficient
    v
    Last edited by Omnikam; 9th October 2010 at 07:49.

  3. #33
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    try to compute 677736310421763136345+522223689578236863655

  4. #34
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I did a little test first(not using your numbers) and thought i had a solve
    First row seed is 7 second row 5.Then added the seeds and reverse the formula obtaining the correct x
    But this solve does not work on your numbers as the second set of numbers finish before the first which as you know
    changes the end result. I even decided to balance both sides by using the +0 in places that ended in the two rows not following the same
    repeated, Addition division formula,and still this leads to failure. Is this an impossible problem using my technique?
    6779+4048=x
    ------------so-x=10827...This works but Look at my last post because im having problems with your numbers.....v
    +1--- = +2 = -3
    /10-- -/10 = *10
    +2--- --+5 = -6
    /10-- -/10 = *10
    +2 ----+9 = -11
    /10 ---/10 = *10
    s=7---s=5 ----s+s=12
    I had thought at this stage i had a working solution but i was wrong
    Ah yes i saw what you meant, i had added up poorly and my result was wrong---FIXED---but im still having problems with yours
    Last edited by Omnikam; 9th October 2010 at 14:30.

  5. #35
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    it's your numbers, i just suggest to add them in order to understood your own algorithm

  6. #36
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    okay ive started working on Your challenge
    677736310421763136345+522223689578236863655= 1199960000000000000000
    using my addition then divide by ten i get the following pattern(i only show a * to denote 2 consecutive division's)
    I then obtain the seed number 6 & 7

    522223689578236863655------------=---------+543631367124* 1367777=6
    ---------------------------this is the point that it goes wrong * 1199960010-----13
    677736310421763136345------------=---------+5563686328759 8636222=7
    Working my way from right to left it all works until i reach the place that was divided twice marked whith a *
    Assume that after ever subtraction there is a multiplication of 10.So 13 is multiplied by 10 then - 9 and so on
    Ive been trying to think up a solution to this problem but so far am stump though im working on an idea, ill post
    if it work's soon
    Last edited by Omnikam; 9th October 2010 at 14:32.

  7. #37
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    You know that random compression is impossible. Why do you keep trying?

  8. #38
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Your right Matt in that i do think its impossible in the true sense of the word, but then in you book you say X= M described by L so Ive been playing with the idea of changing the language that represents a formula/equation , in my last example
    x=522223689578236863655 or
    67773631*421763136345=x
    1=A 2=b 3=c 4=d 5=e 6=f 7=g 8=h 9=i 0=j *=U
    fgggcfcaudbagfcacfcde=x ;not any better
    i Could then use a word tree to reduce even more,for example
    fc=1;gg=2;etc ;but now look what happens
    f2gc1audbag1ac1de=x ;now compare that to the original size of x, and that was only using two number/letter substitutes. Since this method relies on decimal integer values then the word tree needn't change and can be incorporated into the decompression program.
    522223689578236863655=x ;five digit reduction from just a two number tree
    So although i dont think recursive random data compression is possible, i do think that a language substitution for M is more than possible, as my silly example shows.
    Granted that the combined size of the word tree+formula exceeds the original x, it was never my idea to distribute the word tree with the formula, instead the word tree is built into the compressor/de-compressor. In this situation then yes random data can be expressed in a Smaller form, but only by expanding the decompressor's size
    Still Matt, the example above Does work without making stupid claims of being a recursive data compressor:wink:
    PS i enjoy the challenge of thinking outside the box while not breaking the rules, in while doing a course we had to learn some quick notation. That is where i got my idea from
    Last edited by Omnikam; 11th October 2010 at 05:34.

  9. #39
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Omnikam View Post
    Your right Matt in that i do think its impossible in the true sense of the word, but then in you book you say X= M described by L so Ive been playing with the idea of changing the language that represents a formula/equation , in my last example
    x=522223689578236863655 or
    67773631*421763136345=x
    1=A 2=b 3=c 4=d 5=e 6=f 7=g 8=h 9=i 0=j *=U
    fgggcfcaudbagfcacfcde=x ;not any better
    i Could then use a word tree to reduce even more,for example
    fc=1;gg=2;etc ;but now look what happens
    f2gc1audbag1ac1de=x ;now compare that to the original size of x, and that was only using two number/letter substitutes. Since this method relies on decimal integer values then the word tree needn't change and can be incorporated into the decompression program.
    522223689578236863655=x ;five digit reduction from just a two number tree
    So although i dont think recursive random data compression is possible, i do think that a language substitution for M is more than possible, as my silly example shows.
    Granted that the combined size of the word tree+formula exceeds the original x, it was never my idea to distribute the word tree with the formula, instead the word tree is built into the compressor/de-compressor. In this situation then yes random data can be expressed in a Smaller form, but only by expanding the decompressor's size
    Still Matt, the example above Does work without making stupid claims of being a recursive data compressor:wink:
    PS i enjoy the challenge of thinking outside the box while not breaking the rules, in while doing a course we had to learn some quick notation. That is where i got my idea from

    So, you replace fc with 1, what do you do with the number 1? If you keep 1 as 1, how do you decompress this? What if the random data uses all possible symbols?

    You're basically suggesting a palate for random data, but that would require the random data to be comprised only of a small subset of all possible symbols; good luck with that.

    Say that for every character missing from the set of random data you create one palate replacement for a 2 character string. The chances of finding that string in the data is 1 in every 677 characters (26^2 + 1), and for this to even remotely work the definition would have to be written in the file, because the characters missing from the file would differ from file to file. So say that you could manage to get the definition down to 4 characters per definition (1fc,2gg you would need 2.6KB (667*4) of data per definition to break even. 8 bytes in 5KB might save you a megabyte or two over a gigabyte of data, but you have to consider that as the data becomes longer, the likelihood of missing characters existing at all gets incredibly small, and the compression/decompression time for 2 MB in a GB of data just isn't worth it.

    Edit: haha my math was also way off, I forgot that the charset isn't only 26 letters, so forget 1 in 676, it's more like 1 in (2^16^2). So you save a few bytes in a few gigabytes for random data :P

    Edit 2: I haven't looked at Barf2, but if, as you say, the data contains no random symbols, then it isn't really random...
    Last edited by ForPosterity; 11th October 2010 at 16:35.

  10. #40
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    My model was only loosely based on Barf2 , Matt's funny toy;, it creates a file of decimal integers ranging from 0-9 there are No garbled symbols. So my example above was in relation to Matt's Decimal File and how to condense it through representation, since it Only uses 0-9. I hope that clears up my intentions here. I agree it wouldn't work on output files that contained Random symbols.Have you looked at Barf2?
    if numbers 0-9 are replaced with letters A-J foe eg then letters combination are replaced with numbers eg fc=1, it really doesn't matter what file is used as long as its been put through Barf2, as its Matt's decimal output that has most relevance. Keep in mind the final file would only contain numbers in order to represent a combination of letters, but since we are assigning 0-9 a specific letter equivalent then Any symbol can be used to describe a 2 digit combination. N^9 combination's, but im only looking at N being 2. As its easier to find 2 letter combination than 3 , also if N=3 that would be too hard as there isnt enough free 1 digit symbols to allocate.

  11. #41
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Quick test with barf2 on the million random digits file. http://mattmahoney.net/dc/AMillionRandomDigits.bin (415241 bytes).

    After about 0.951 x 10^1000000 recursive compression cycles (coincidentally a number with exactly a million digits ), the file was compressed to 0 bytes in 4672 seconds. The exact number of cycles was written to barf2.count. Next I tried compressing this number:

    barf2.count -> 1000000
    zip -9 -> 470634
    ppmonstr -> 419297
    zpaq -> 415561

    For zpaq I used a stationary order 0 model.

    Code:
    comp 0 0 0 0 1
      0 cm 9 255
    hcomp
      halt
    post
      0
    end
    So the counting argument still holds

  12. #42
    Member
    Join Date
    Sep 2010
    Location
    Australia
    Posts
    46
    Thanks
    0
    Thanked 0 Times in 0 Posts
    So the counting argument still holds
    Of course it does and always will
    The more i read up on mathematics the more i realize its a flawed symbology, and although it works in most cases ,it also creates it's own paradoxes of inconsistency's
    Yet its not necessary to understand a shovel at the sub-atomic level in order to use it, s o i will continue to apply maths where need be
    Ps Matt i didnt actuallly think my "Silly Model" would beat the counting argument, As i noticed that the decimal number from barf2.count is always of a larger size than the original file, so any compression is actually fighting to equal its size and not beat it. I did however look into the "Prime number" thing, those well meaning time wastes go on about, and of course i ran some tests here's what i found

    A B C D must be integers
    Lets say x= 53457865468 or A*B*C*D=x
    obtain the prime factorization of X
    To factor the large number, I used a computer program I wrote. It
    implements a standard method: Trial-divide the number by each prime
    number in turn, until you get a remainder of 0 (or until you have
    tried all prime numbers less than or equal to the square root of
    your number, in which case it is a prime number). Then record the
    divisor in your list of prime factors; take the quotient as the
    number you are now trying to factor; and start the trial divisions
    again with the same prime number you used last.
    Thus,
    53457865468 / 2 = 26728932734 rem 0 so prime factorization = 2*...
    26728932734 / 2 = 13364466367 rem 0 so p. f. = 2*2*...
    13364466367 / 2 = 6682233183 rem 1; no good, keep trying
    13364466367 / 3 = 4454822122 rem 1; no good, keep trying
    13364466367 / 5 = 2672893273 rem 2; no good, keep trying
    13364466367 / 7 = 1909209481 rem 0, so p. f. = 2*2*7*...
    [try prime factors 7, 11, 13, ...]
    1909209481 / 277 = 6892453 rem 0, so p. f. = 2*2*7*277*...
    ...
    6892453 / 467 = 14759 rem 0 so p. f. = 2*2*277*467*...
    [try prime factors up to sqrt(14759) = 121.5; none divides evenly,
    so 14759 is prime.]
    A = 2*7 = 14
    B = 277
    C = 2*467 = 934
    D = 14759
    So x=14*277*934*14759 (expressed larger than the original number ) Prime fails lol
    Last edited by Omnikam; 13th October 2010 at 04:41.

  13. #43
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Question DID IT SALE?

    Quote Originally Posted by Matt Mahoney View Post
    Really, I am not making this up. US patent 7733249 describes a method of compressing data more than once to improve the compression ratio. According to the patent, the method is to divide a data stream into objects and to "compress" them by treating objects as binary numbers and encode them as arithmetic expressions that have the same value. You can then apply the method recursively to both individual objects and to the stream as a whole.

    You can buy the patent now for US $275,000 or wait until the auction on Nov. 11. Working code is not included.
    http://icapoceantomo.com/item-for-sa...mpressing-data

    Of course this is not the first time the USPTO has allowed patents on random data compression.
    http://www.faqs.org/faqs/compression...section-8.html (section 9.5).
    I was wondering did any one buy this? It's past Nov 11th

  14. #44
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    If it did i bet the person who bought it is also this guy:
    http://www.tomshardware.com/news/con...are,11613.html

  15. #45
    Member Andrey V. Miller's Avatar
    Join Date
    Nov 2010
    Location
    Russia
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Wink compress any data to 64 bits

    It is possible to compress any data to 64 bits.
    I have thought up algorithm in which entropy of the data isn't considered.
    The algorithm is checked up in usual EXCEL on the casual data.
    The future program will be open source.

    P.s. Awards and money me don't interest. I do for people.

    P.s.(2) And I badly know English. But the given fact not strongly excites me.

  16. #46
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Andrey V. Miller View Post
    It is possible to compress any data to 64 bits.
    I have thought up algorithm in which entropy of the data isn't considered.
    The algorithm is checked up in usual EXCEL on the casual data.
    The future program will be open source.

    P.s. Awards and money me don't interest. I do for people.

    P.s.(2) And I badly know English. But the given fact not strongly excites me.
    Seriously, it's boring. Bits are bits and if you use 64 of them you can store only 2^64 different files losslessly. Waving hands doesn't help, you can't get around the counting algorithm.

  17. #47
    Member Andrey V. Miller's Avatar
    Join Date
    Nov 2010
    Location
    Russia
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by m^2 View Post
    Seriously, it's boring. Bits are bits and if you use 64 of them you can store only 2^64 different files losslessly. Waving hands doesn't help, you can't get around the counting algorithm.
    This statement will be incorrect for cyclic algorithm.
    Received variants aren't limited 2^64.
    Compression gradual.
    With constant factor of compression = 1,2
    For example.
    1) 1000 mbyte-> 834 mbyte.
    2) 834 mbyte-> 695 mbyte.
    3) 695 mbyte-> 579 mbyte.
    .........
    47) 238925 byte-> 199104 byte.
    .........
    90) 94 byte-> 79 byte.
    91) 79 byte-> 65 byte.
    The quantity of variants changes each time.
    2^64 it is fair only for point 91)

  18. #48
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by m^2 View Post
    Bits are bits and if you use 64 of them you can store only 2^64 different files losslessly. Waving hands doesn't help, you can't get around the counting algorithm.
    Maybe, in another universe, there are only 2^64 different models for everything
    Or a future compressor outputs only the position in PI where our data starts
    that could be a near infinite big number or a very small one...man that would be awesome
    you'd never know if your data expands...



    With constant factor of compression = 1,2
    That's the catch
    Last edited by Sebastian; 18th November 2010 at 20:49.

  19. #49
    Member Andrey V. Miller's Avatar
    Join Date
    Nov 2010
    Location
    Russia
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sebastian View Post

    That's the catch
    Aha.

    It was difficult to think up.
    Has started to think in 2007.
    In 2008 has thrown.
    It seemed that it is impossible to make.

    Has returned to idea in 2009.
    Has made inconceivable efforts.

    Has more recently checked up algorithm.
    5 minutes in an infinite cycle.
    On the casual data in EXCEL
    Successfully.

    At present I write the code and the documentation with the detailed description of algorithm. About possible ways of its improvement.

  20. #50
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 13 Times in 2 Posts
    Quote Originally Posted by Andrey V. Miller View Post
    Aha.

    It was difficult to think up.
    Has started to think in 2007.
    In 2008 has thrown.
    It seemed that it is impossible to make.

    Has returned to idea in 2009.
    Has made inconceivable efforts.

    Has more recently checked up algorithm.
    5 minutes in an infinite cycle.
    On the casual data in EXCEL
    Successfully.

    At present I write the code and the documentation with the detailed description of algorithm. About possible ways of its improvement.
    You should start from decompression first
    Enjoy coding, enjoy life!

  21. #51
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Do you know how comp.compression looked like 8 years ago? ...and what about nowadays?

    I really appreciate the appearance of some new people here, to keep things going, introduce new ideads and enrich discussions.

    But... Close this thread. Ban next time, if necessary.
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  22. #52
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Well I traced through the webpages and Google phoned the company selling this wonderful patient. Guess what it did not sell yet. I was shocked since they assured me earlier that if I waited tell after the auction it would definitely have been sold. So for those of you who do not have a US patient here is one still available.

  23. #53
    Member
    Join Date
    Feb 2011
    Location
    St. Albans, England
    Posts
    20
    Thanks
    0
    Thanked 0 Times in 0 Posts
    http://gailly.net/05488364.txt Meh, so I wasn't the first to think along them lines

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Any money in data compression?
    By bitewing in forum The Off-Topic Lounge
    Replies: 18
    Last Post: 19th March 2019, 11:34
  2. Recursive LZ
    By chornobyl in forum Data Compression
    Replies: 1
    Last Post: 29th September 2018, 13:28
  3. Data compression explained
    By Matt Mahoney in forum Data Compression
    Replies: 92
    Last Post: 7th May 2012, 19:26
  4. Data Compression Crisis
    By encode in forum The Off-Topic Lounge
    Replies: 15
    Last Post: 24th May 2009, 20:30
  5. Data Compression Evolution
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 11th February 2007, 15:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •