Page 2 of 2 FirstFirst 12
Results 31 to 43 of 43

Thread: video compression (test)

  1. #31
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,946
    Thanks
    294
    Thanked 1,286 Times in 728 Posts
    Btw, can you post some results for these codecs with some known files?
    I'm specifically interested in jpegs and DNA data, for example, A10.jpg and E.coli.

  2. #32
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by Shelwien View Post
    Btw, can you post some results for these codecs with some known files?
    I'm specifically interested in jpegs and DNA data, for example, A10.jpg and E.coli.
    a10.jpg 842468 -> 658121 in 6.4 sec. Stuffit and paq8px beat that at the moment. We did additional tests on larger data sets and the results are about 1% better than paq8px_v67 and 1% behind Stuffit, but Stuffit is faster. I'm going to be looking at some improvements.

    We don't actually compress DNA like e.coli. It is actually for SRF files which contain "reads" from DNA sequencing machines. The machines take millions of DNA strands and go through a small number of cycles (35 in the test files I used, but now a few hundred) where they sequence one nucleotide at a time by attaching one variant of C, G, A, or T that fluoresce in different colors. The output of the machine goes to an algorithm that pieces them together. So each read contains an array of elements each with 4 numbers and the interpreted DNA code and a confidence level. Part of the data is poorly compressed with a variant of zip, so we were able to improve on that by uncompressing it and applying a custom algorithm. It beats paq8px and is reasonably fast. The output files are huge (several GB).

  3. #33
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,946
    Thanks
    294
    Thanked 1,286 Times in 728 Posts
    Thanks for information
    In fact, I'm kinda working on jpeg recompressor now, among other things,
    and some goal could be useful.
    Btw, what about multiscan (progressive etc), arithmetic code, and colorspace
    support in your jpeg coder?

  4. #34
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    It's actually based on PAQ with some tuning and reduced number of models (12) to make it faster. So it doesn't support progressive model like Stuffit. However progressive mode only makes up a few percent of JPEGs so adding it didn't have high priority. We did have a customer with 12 bit images but it wasn't hard to add that capability. I doubt that we will do the full JPEG standard. Nobody supports arithmetic coding or hierarchical mode AFAIK.

  5. #35
    Member
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    here's a little update for you guys

    ok, first, my program is not yet completed as its my first attempt at playing with bytes... i've been able to copy a file byte per byte (noob, i know)

    2) the theory is finished, i only have to finish the program, some parts are already done, and i need to test one last thing...

    3) my results: in theory, i should be able to compress ANY file by about 50%
    this includes already compressed files
    i think i can do even better, if that thing i want to test works as i wish

    i'll keep you updated if you promise me you'll test it
    (i don't think i needed to ask this, i know you won't resist)
    i hope it will be finished friday...

  6. #36
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    Theorem:
    No program can compress without loss *all* files of size >= N bits, for
    any given integer N >= 0.

    Proof:
    Assume that the program can compress without loss all files of size >= N
    bits. Compress with this program all the 2^N files which have exactly N
    bits. All compressed files have at most N-1 bits, so there are at most
    (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of
    size N-2, and so on, down to 1 file of size 0]. So at least two different
    input files must compress to the same output file. Hence the compression
    program cannot be lossless.

    The proof is called the "counting argument". It uses the so-called pigeon-hole
    principle: you can't put 16 pigeons into 15 holes without using one of the
    holes twice.

    Much stronger results about the number of incompressible files can be obtained,
    but the proofs are a little more complex. (The MINC page http://www.pacminc.com
    uses one file of strictly negative size to obtain 2^N instead of (2^N)-1
    distinct files of size <= N-1 .)

    Note that
    no assumption is made about the compression algorithm. The proof applies to
    *any* algorithm, including those using an external dictionary, or repeated
    application of another algorithm, or combination of different algorithms, or
    representation of the data as formulas, etc... All schemes are subject to the
    counting argument. There is no need to use information theory to provide a
    proof, just very basic mathematics. [People interested in more elaborate proofs
    can consult http://wwwvms.utexas.edu/~cbloom/news/nomagic.html ]

    In short, the counting argument says that if a lossless compression program
    compresses some files, it must expand others, *regardless* of the compression
    method, because otherwise there are simply not enough bits to enumerate all
    possible output files. Despite the extreme simplicity of this theorem and its
    proof, some people still fail to grasp it and waste a lot of time trying to
    find a counter-example.

    This assumes of course that the information available to the decompressor is
    only the bit sequence of the compressed data. If external information such as a
    file name, a number of iterations, or a bit length is necessary to decompress
    the data, the bits necessary to provide the extra information must be included
    in the bit count of the compressed data. Otherwise, it would be sufficient to
    consider any input data as a number, use this as the file name, iteration count
    or bit length, and pretend that the compressed size is zero. For an example of
    storing information in the file name, see the program lmfjyh in the 1993
    International Obfuscated C Code Contest, available on all comp.sources.misc
    archives (Volume 39, Issue 104).

    A common flaw in the algorithms claimed to compress all files is to assume that
    arbitrary bit strings can be sent to the decompressor without actually
    transmitting their bit length. If the decompressor needs such bit lengths
    to decode the data (when the bit strings do not form a prefix code), the
    number of bits needed to encode those lengths must be taken into account
    in the total size of the compressed data.

    Another common (but still incorrect) argument is to assume that for any file,
    some still to be discovered algorithm might find a seed for a pseudo-random
    number generator which would actually generate the whole sequence of bytes
    contained in the file. However this idea still fails to take into account the
    counting argument. For example, if the seed is limited to 64 bits, this
    algorithm can generate at most 2^64 different files, and thus is unable to
    compress *all* files longer than 8 bytes. For more details about this
    "magic function theory", see http://www.dogma.net/markn/FAQ.html#Q19

    Yet another popular idea is to split the input bit stream into a sequence of
    large numbers, and factorize those numbers. Unfortunately, the number of bits
    required to encode the factors and their exponents is on average not smaller
    than the number of bits of the original bit stream, so this scheme too cannot
    compress all data. Another idea also related to primes is to encode each
    number as an index into a table of primes and an offset relative to the indexed
    prime; this idea doesn't work either because the number of bits required to
    encode the index, the offset and the separation between index and offset
    is on average not smaller than the number of bits of the original bit stream.

  7. #37
    Member
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    each byte (8 bits) can represent a number between 0 and 255
    now add these numbers one after the other (0= 000 and 10 = 010) so it will result in a very large number

    i'm using java, so here is how i did it
    i use math.pow on a large double and use math.pow again on
    (double - resultofpreviousmath.pow)

    and the result is about 25 - 30 numbers
    these will be the output file.
    now to decompress, lets say the output contains 3 numbers --> a, b and c
    2^a + 2^b + 2^c = the original number
    i split the number in parts of 3 digits and each part will represent a byte of the original file.

    in java, the max size of a double is 300 digits
    each byte of the file to compress will be converted into 3 digits
    300/3 = 100 bytes of the original file
    100 bytes will be compressed in 25 - 30 bytes

  8. #38
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Lone_Wolf View Post
    each byte (8 bits) can represent a number between 0 and 255
    now add these numbers one after the other (0= 000 and 10 = 010) so it will result in a very large number

    i'm using java, so here is how i did it
    i use math.pow on a large double and use math.pow again on
    (double - resultofpreviousmath.pow)

    and the result is about 25 - 30 numbers
    these will be the output file.
    now to decompress, lets say the output contains 3 numbers --> a, b and c
    2^a + 2^b + 2^c = the original number
    i split the number in parts of 3 digits and each part will represent a byte of the original file.

    in java, the max size of a double is 300 digits
    each byte of the file to compress will be converted into 3 digits
    300/3 = 100 bytes of the original file
    100 bytes will be compressed in 25 - 30 bytes
    Indeed, this compresses very well. Actually to ~log(size), so much better than 50%. Too bad that decompression is somewhat worse.
    How can you tell whether 010 started as 000,010 or 001, 001 or 010,000?
    Actually it's even worse with bigger numbers and more of them.
    Read again Bulat's first proof and don't believe in magic.

    ADDED:
    Bulat:
    The MINC page is empty.
    ADDED2:
    NVM, I'm sleepy and I didn't even see your decompression code. But Bulat explained this, I guess.
    Last edited by m^2; 14th January 2010 at 10:36.

  9. #39
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    i've copied part of comp.compression FAQ

  10. #40
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by Lone_Wolf View Post
    each byte (8 bits) can represent a number between 0 and 255
    now add these numbers one after the other (0= 000 and 10 = 010) so it will result in a very large number

    i'm using java, so here is how i did it
    i use math.pow on a large double and use math.pow again on
    (double - resultofpreviousmath.pow)

    and the result is about 25 - 30 numbers
    these will be the output file.
    now to decompress, lets say the output contains 3 numbers --> a, b and c
    2^a + 2^b + 2^c = the original number
    i split the number in parts of 3 digits and each part will represent a byte of the original file.

    in java, the max size of a double is 300 digits
    each byte of the file to compress will be converted into 3 digits
    300/3 = 100 bytes of the original file
    100 bytes will be compressed in 25 - 30 bytes
    1. double has only 13 digits or so. it can represent numbers from 1e-308 to 1e308 but only 13 digits are preserved. try to compute 1e300+1-1e300

    2. but that's not important. let's work with large binary numbers, even if they aren't supported in Java. each byte is 8 bits, 32 bytes are 256 bits. instead of writing these 256 bits directly you wanna to write nu,bers of bits set, i.e. for 010011... you will output (counting from 0) 1,4,5,.... at average, half of bits are set, so instead of writing 256 bits you will write 128 numbers, each 8 bits long, i.e. 1024 bits

    i wonder why you think that 300-digit decimal number, i.e. about 1024-digit binary number, will be represented by 25-30 numbers of bits set. it should be ~500

  11. #41
    Member
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks Bulat! I just realized that even if the max size of a double is 3,xxx *10^308 that number is actually rounded
    i did some research and found the BigInteger class. I do not know what is its max number of digits but i managed to make it as big as 28000 digits
    but it is in fact only 30000/3 = 10000 ~10kb of the original file
    i think 100mb parts would give extremely good ratios without eating the ram too much and without needing too much processing power
    that would result in a 300 000 000 digits number

  12. #42
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Lone_Wolf, I think it would be the best for you to actually try it and see it doesn't work even when YOU do it
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  13. #43
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    482
    Thanked 279 Times in 119 Posts
    once again, the magic of big numbers....

    http://encode.dreamhosters.com/showthread.php?t=484


    i should be able to compress ANY file by about 50% this includes already compressed files
    OK, so let's take a file, and compress it to 50%.
    Now this is a compressed file.
    Can you compress it again ?

Page 2 of 2 FirstFirst 12

Similar Threads

  1. JPEG Compression Test [April 2010]
    By Skymmer in forum Data Compression
    Replies: 18
    Last Post: 7th February 2011, 22:30
  2. JPEG Compression Test [December 2009]
    By Skymmer in forum Data Compression
    Replies: 9
    Last Post: 23rd December 2009, 20:06
  3. PIM Archiver Video
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 28th October 2007, 20:46
  4. Fun video
    By encode in forum Forum Archive
    Replies: 7
    Last Post: 22nd October 2007, 22:26
  5. The "Nuff said" video!
    By encode in forum Forum Archive
    Replies: 5
    Last Post: 3rd January 2007, 22:11

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •