Page 2 of 9 FirstFirst 1234 ... LastLast
Results 31 to 60 of 254

Thread: loseless data compression method for all digital data type

  1. #31
    Member Lone_Wolf236's Avatar
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I wrote that i would be happy to work with you if one of these two conditions were met:

    1) i send you 2-3 data sets and you acheive a lossless compression and decompression

    2) you explain your method to me directly or to everyone on this forum and if i cant find a flaw in less than a few hours, i will assume that your method works


    My way of thinking is to explore every new idea!

  2. #32
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    okay mr. wolf, i like the way u think. but i dont get any new PM on my inbox. and about datasheet, thank you for any attachment. i will read that, but surely i cant understand any programming language. nice to see you. im still waiting for other member's reply to discuss this idea

  3. #33
    Member Lone_Wolf236's Avatar
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    i didn't mean a datasheet, but a data set: a very small file so you can test your method.
    you compress it, and then decompress it using your method, and if the decompressed file is the exact same as the orginal, it means that your method works!

    but we need to do that a few times with different files to make sure that it wasn't just luck! :P

  4. #34
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    sure. i dont create the program so let me explain how the program must work. and maybe u can create the program based on the method. this is as far as i can answer for ur data set . so, may i see the data set?

  5. #35
    Member Lone_Wolf236's Avatar
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    sure!

    what does your method require?
    Hexadecimal? Text? Numbers? Binary?

    and how big do you want it to be?

  6. #36
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    just give me any file. im using hexadecimal to explain

  7. #37
    Member Lone_Wolf236's Avatar
    Join Date
    Aug 2009
    Location
    Canada
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts
    i attached a random wallpaper (.jpg)

    take your time to explain in detail!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Technology-Binary-74166.jpg 
Views:	326 
Size:	859.6 KB 
ID:	1439  

  8. #38
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    my pm sent to ur inbox. i hope u start to understand my explain

  9. #39
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    here the explanation of ur attached thumbnail.
    im opening it using hex editor and this is the result :
    Click image for larger version. 

Name:	bitstructures.png 
Views:	512 
Size:	49.8 KB 
ID:	1440

    notice that on the first line we didnt always get a file extension information. sometimes we get for example %PNG, rar, depends on the file type and how hex editor read it. as i said before, just try to write down this bit code into hex editor :

    Click image for larger version. 

Name:	Save As .rar.png 
Views:	502 
Size:	2.9 KB 
ID:	1441

    save that file into .rar and u will got a new working file.
    if u dont mind, u can create a new image file -exact same- as mr. lone wolf attachment by copy paste the wallpaper bit code or manually write the code with ur finger. after done, save that code as .jpg and u got the same wallpaper file. maybe u can create ur own bit code if u know the structure. where is the compression? i didnt tell u yet. maybe u guys got something from this?
    shall we discuss more of it?
    Last edited by rarkyan; 7th December 2010 at 08:46.

  10. #40
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    So copying a file results in exact the same file? What an unbelievable surprise. You're just making fun of us, do you?

  11. #41
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I am very impressed that google translate is making such clear translations of such technical jargon. Can this be true?

  12. #42
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    my apologize, im not trying to teach fish how to swim. i just want to make a little contribution about data compression. im not really making fun of this. im a serious and a good person . sometimes im using google translate when im getting stuck on translating to english language. sometimes i didnt use GT while im still able to write on english. i hope u understand what i mean

  13. #43
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Until now you only demonstrated that you've understood that a hex-view of a file is equivalent (beside the base) with its internal representation. Where's the compression?

  14. #44
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts

    Thumbs up

    Quote Originally Posted by Bulat Ziganshin View Post
    i think that someone who can squeeze 1000 rabbits into 10 cages so that every rabbit hat its own cage shouldn't spend his precious time working in IT. you should go directly into building paradise on Earth
    Okay mr. Sebastian. I like this word from Bulat Ziganshin: "squeeze 1000 rabbits into 10 cages". Lets do this. On hex editor program, i think it could be done. Try to search every one of bit. Take an example search for "00". U see a highlighted result of "00". Thats ur rabbit. Now u must create a cages for that highlighted rabbit. I think from this point it is very clear. If u understand, so just go faster to make a patent of this method. I still want the money... but maybe i cant, except u want to share on me. So u people please done this for me. Hehehe
    Last edited by rarkyan; 7th December 2010 at 15:15.

  15. #45
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Tell me if u still confuse with my explanation. Dont forget to correct me if i wrong. I do this for the greater goods, for common wealth. So if the method possible, now i can 1000x download any file more faster from the internet. Hehehehe

  16. #46
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    926
    Thanks
    58
    Thanked 116 Times in 93 Posts
    you know i got a better idea. instead of sharing this great idea, share whatever you are smoking... cause i defiantly want some.

  17. #47
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    i offer more great idea. how bout to kiss the girl on my avatar. sounds good eh?

  18. #48
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by rarkyan View Post
    Take an example search for "00". U see a highlighted result of "00".
    So you would like to search for every occurence of byte-values.

    Thats ur rabbit. Now u must create a cages for that highlighted rabbit. I think from this point it is very clear.
    The referencing system would be all the magic. Ok explain me at this example.

    Suppose we have "10 00 1F 00 00 FF", which makes 3 times "00"-rabbits. Now show me how to create a cage so that the compressed string is shorter and restore the original string with your method.

  19. #49
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    http://mattmahoney.net/dc/dce.html#Section_11

    But since you are not a programmer I don't expect you to understand the math. That's OK. A proof is not the same as a convincing argument. There are plenty of people like you on comp.compression that think they can compress any file over and over again, but always they come up with nothing because the math is right.

  20. #50
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Okay, i know im far more stupid compare to u all. because programming is not my daily food . Here i give u a hint again. This is a realy bad english translation. Take ur time to understand this.

    A bigmouth hint:

    1. I said that hex editor program can create any file by combining bit code into the right structure
    2. For mr. Sebastian question :
    "Suppose we have "10 00 1F 00 00 FF", which makes 3 times "00"-rabbits. Now show me how to create a cage so that the compressed string is shorter and restore the original string with your method"

    This is my answer : "10 00 1F 00 00 FF" we got 3 "00"-rabbit. They stay at 3 of 6 column/holes. The rabbit can jump on any collumn right?. But on that case, we see the rabbit dig a hole into 2nd, 4th, and 5th column (see the bold 00). U must create the cage for all possibilities. Every cage must have its own name. So when u already create all rabbit cage, and give name on every one of the cage, the next step is to see where the rabbit dig the hole. Cage them while they're on the hole. And this is the mathematic formula to define the cage: (2^n)-1. U all know about this. On that case, u got (2^6)-1 and the result are 63 cages for any rabbit hole possibilities. Now u have at least 1-63 name of the cages. Define the rabbit posisition using the cage name are more shorten than count the rabbit one by one. Isnt it? Correct me if i wrong.
    Last edited by rarkyan; 8th December 2010 at 11:55.

  21. #51
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Forget something. How to restore the rabbit? Use the cage. Put the rabbit back into their original holes/column by call the cage name. Because the cage lock them up, i hope the rabbit is fine

  22. #52
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    891
    Thanks
    492
    Thanked 280 Times in 120 Posts
    Good point, with a short example which is just 6 bytes, it works.
    Now, suppose you have a larger example, such as a file.
    enwik9 for exemple.
    now your rabbit can hole into several of many bytes within 1 billion possibilities.

    How to describe them all ? just need a figure which size is 2^(1 000 000 000) ?
    a single figure to describe an entire file, surely it should be clear that it compresses so well.
    ow, that's a bitstring by the way...

    No, i guess your idea was rather on this side :
    Cut the file into lines, such as your hex editor does.
    Repeat the process for each line.
    Then, how do you handle lines with a different number in each column ?

    More difficult, have you calculated the compression gain ? is it still effective if there is a single repetition within the line ?

  23. #53
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Hahaha. There is still an "IMPOSIBLE" thing on this method. But there is still possible because we're a human. Do u know the world's most biggest building? Who build them? Who create the 1st computer? How big does it size? Not me.
    By the way, i think u got it right mr. Cyan. How if the rabbit hole as much as 75.000.000? Do we need (2^75.000.000)-1 to bring the rabbit into the cages? Umm.... i think yes. So im imagine the compressor will be a combination of software and hardware. Software to compress and decompress. And hardware to support saving any "rabbit cages", databases, processors, RAMs, or i dont know what else. Pretty logic eh? I cant calculate the product cost or how many hardware and what is the minimum requirement to complete this method. I told u, im not a programmer, and im not born from mathematic studies. I just want the experts/specialists doing this. I cant pay them/you, maybe someone will pay them/you if this method had succeed to realized. I just can share it here.

    For ur question :
    "Then, how do you handle lines with a different number in each column ?"
    Answer : Lines with different number? They are the same rabbit with different colour. With the same posibilities to jump over the different holes/column. What else? U only need to create cages with any possibility. U only need the (2^n)-1 just once. After u've got all cages, u can catch every rabbit u want.

    "is it still effective if there is a single repetition within the line ?"
    Answer : single repetition, do u mean there is only 1 kind of bit code at single line? Maybe the cages already handle that.

    "Cut the file into lines, such as your hex editor does"
    Answer : I will going crazy to catch the rabbit on 1 column (vertical), not on 1 row (horizontal) which only have 16 holes/column per 1 row on hex table, or maybe i'd like to catch them all on (n row * n column). Tell me im crazy. U know what? I cant do that. I have no good facilities and abilities. If i could, then i would.

    Now its up to u guys. Do or Not
    Last edited by rarkyan; 8th December 2010 at 14:31.

  24. #54
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    This "compressor" trying to handle for the big files. Any kind of big files. Loseless. Why are u trying to compress such of a tiny files?
    Last edited by rarkyan; 8th December 2010 at 14:27.

  25. #55
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Does anyone here try to coding my method? Just wondering maybe its possible

  26. #56
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Nobody here is going to write your impossible compression algorithm. You will have to learn to program and figure out for yourself that it won't work. Everyone else here already knows that. You can waste your own time.

  27. #57
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    309
    Thanks
    68
    Thanked 173 Times in 64 Posts
    Rarkyan, your method is known as an enumerative coding.
    1. It will work with highly compressible data (if the 8-bit input alphabet contains less than 128 symbols => you need 2^7 or less cages), but the compression ratio will be worse than gzip or even Huffman coding.
    2. It will not work with random or already compressed data (the 8-bit input alphabet has from 128 to 256 symbols => you need 2^8 cages = 256).
    Last edited by inikep; 9th December 2010 at 22:58.

  28. #58
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    I still wont give up. I dont understand which part is impossible. Maybe the most "impossible" things are to create (2^n)-1, which for 1GB data there is need (2^75.000.000)-1. I got 75.000.000 by checking the rows on 1 column. Like this :

    If i have the ability on programming, i will try to create "the cages" or Database ID using (2^n)-1 formula, which n is about 75.000.000 (see the rows, on the picture only 71.285.376 for 1GB data). This Database ID only need to created once. Because they will cover for every 256 bit code occurences on 1 column and 75.000.000 rows. The other bit will use the same Database ID, because the ID already record their bit pattern movement.

    This is the bit pattern that i mean, or maybe u called them occurences:

    After the Database ID already created, the next step is to search every occurence of every bit code. There is total 256 bit code to create file, so the program need to search 256x, or search 256 bit code at the same time. This is why i imagine, there will be a hardware support. Talk about the output, this is my idea :


    notes : "x" are Database ID or "the cages" or maybe it is Occurrence ID

    The program need to write down the Database ID on the "x" mark. Because my idea using search occurence at one column and n rows, there is total 16 columns on 1 row for every bit code. I hope everyone understand.

    To decompress, the program must read the Database ID from the output file, and restore the bit pattern into their original position on hex editor. After all 256 bit already turned back on their original position, the complete structure of a file already created. The final step is to save the file into their original extension. And we got a new working file.

    If the impossible thing are to create the Database ID which will consume (2^75.000.000)-1, then i will stop my comment on this forum because the expert cant do that. Thank you for your attention
    Last edited by rarkyan; 26th February 2011 at 10:48.

  29. #59
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts

  30. #60
    Member
    Join Date
    May 2009
    Location
    CA USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Rarkyan, let me summarize your algorithm. You look for very common duplicated bytes like 00 in your file. You then take these out of the file, making it shorter, and you remember a simple bitmask to tell you where those values were taken from. To re-expand, you use the bitmask to place the byte back into place.

    This is a workable algorithm and it can indeed give compression for many files. For example, if half the bytes of a file are 00, you'd make a new file that's only half the size. But you also need to remember that bitmap, which is 1/8 of the file size (one bit per byte.)
    If more than 1/8 ( 12.5%) of the files bytes are that one common byte, you end up with a net savings of space.. you have compression! Success!

    The flaw of this algorithm is that it can't compress files which have no common byte that happens more than 12.5% of the time. The 12.5% comes from the size of the bitmap compared to the size of the file itself.
    For example, say you had an 800 byte file, and byte 00 happened a lot.. 10% of the time. So you make your bitmask of 800 bits, which is 100 bytes. You can remove all the 00 bytes from the original file, so it's now shorter, it's 720 bytes.
    But look.. now the 720 byte file still needs the extra 100 byte bitmap.. that's 820 bytes, bigger than the file you started with. So there's no space savings for that example... it gets bigger.

    So your algorithm can work for some files. A very biased file with a very common byte can be compressed with a bitmap "hole" map. But not all files can be compressed this way.

    Now there are many many many ways to improve this and take advantage of biased byte distributions.. when one byte is more likely than another. The first algorithms in compression books deal with efficient ways to handle this problem and can take advantage of even small biases. But none of those methods can compress all files.

    Good for you for being interested in compression.. it's fascinating. You may be interested in text "entropy", the foundation of Information Theory, which qas a broad definition, quantifies how these small biases in "common bytes" can be efficiently exploited.

Page 2 of 9 FirstFirst 1234 ... LastLast

Similar Threads

  1. Any money in data compression?
    By bitewing in forum The Off-Topic Lounge
    Replies: 18
    Last Post: 19th March 2019, 11:34
  2. Data compression explained
    By Matt Mahoney in forum Data Compression
    Replies: 92
    Last Post: 7th May 2012, 19:26
  3. Advice in data compression
    By Chuckie in forum Data Compression
    Replies: 29
    Last Post: 26th March 2010, 16:09
  4. Data Compression Crisis
    By encode in forum The Off-Topic Lounge
    Replies: 15
    Last Post: 24th May 2009, 20:30
  5. Data Compression Evolution
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 11th February 2007, 15:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •