Results 1 to 13 of 13

Thread: Questions about compression

  1. #1
    Member
    Join Date
    Nov 2011
    Location
    France
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Questions about compression

    Hello everyone,
    I have lots of questions since I began to interest to the compression scene. And I really hope you could answer a few of them.
    • Is arc -m0 better than tar, has it any advantages over tar (it's a question because of one comment from freearc author, that I understood this way)? If not, what is the best store format (with future compression purpose, and important informations saving)?
    • One of the weird results I have seen is that tar > arc ultra is better than directly arc ultra, weird isn't it?
    • Regarding some results, is freearc ultra better than 7z LZMA2 or PPMD, on text files? Despite the inverse results on some benchmarks?
    • What is the best algorithm/compressor to compress games? Compressing bin or iso makes any difference?
    • On what kind of file is lprepaq/precomp useful/efficient?
    • If I'd do a ram upgrade, how much ram would you advise me to get a real benefit on compression, and then which compressor to make use of all my ram?
    • Why do most compression benchmarks use paq -7, instead of paq -8? Because of a 1GB ram limitation?
    • What's the differences between the different paq8px binaries (*.exe , *_alt.exe, *_sse2.exe, *_sse2_alt.exe) ?
    • I have seen that paq8px is the best on lots of different type of files, and I have seen no benchmarks with fp8. Does fp8 remains among the best?
    • What is order-n content mixing? (I'm not bad at algebra, so don't hesitate to use technical words if necesssary)
    • I am more and more interested about freearc over 7z, what are the main advantages, is it any better? any advises? What is freearc algorithm?


    Thank you very much for your answers, and excuse me for my bad english, and for my too many questions.

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Is arc -m0 better than tar, has it any advantages over tar (it's a question because of one comment from freearc author, that I understood this way)? If not, what is the best store format (with future compression purpose, and important informations saving)?
    It sorts files, so if you're about to use some compressor on it, you'll most likely get a better compression ratio.
    Also, it adds checksums, allows fast listing of contents, maybe more that I miss.
    The drawbacks that I see is that there is only 1 implementation to choose from and that it's less reliable at this point.
    One of the weird results I have seen is that tar > arc ultra is better than directly arc ultra, weird isn't it?
    If the difference was tiny, not really.
    Regarding some results, is freearc ultra better than 7z LZMA2 or PPMD, on text files? Despite the inverse results on some benchmarks?
    All that we know is benchmarks, either private or public ones. From what I've seen, it's better.
    What is the best algorithm/compressor to compress games? Compressing bin or iso makes any difference?
    Depends on your requirements. Precomp+paq8 or Precomp+srep+paq8 should be about the strongest option.
    On what kind of file is lprepaq/precomp useful/efficient?
    RTFM
    If I'd do a ram upgrade, how much ram would you advise me to get a real benefit on compression, and then which compressor to make use of all my ram?
    If you can afford it, terabyte * number of cores would be great. Not many compression algorithms can benefit from such numbers, but if you were to design custom ones, this amount would rock on enwik10.
    LZMA and LZMA2 can benefit from IIRC 46 GB/thread.
    What is order-n content mixing? (I'm not bad at algebra, so don't hesitate to use technical words if necesssary)
    I think "Data Compression Explained" by Matt Mahoney would answer this question. Even if not, you won't waste your time by reading it all, or - at least - skimming through it.
    I am more and more interested about freearc over 7z, what are the main advantages, is it any better? any advises? What is freearc algorithm?
    No, it's not algorithm, it's a program.
    FreeArc offers better strength/time ratio on average and some more features. It's less portable and less reliable. Has a more restrictive license.

  3. #3
    Member
    Join Date
    Nov 2011
    Location
    France
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    First, let me thank you for your great answers, and to apologize about the lame question that deserved a RTFM.
    But your answers, raise questions.

    What do you mean by 1 implementation, talking about freearc?

    About the arc vs tar.arc (file names speak for themselves):

    176 071 925 textfiles.m0.arc
    184 443 392 textfiles.tar
    30 034 601 textfiles.ultra.arc
    28 245 776 textfiles.ultra.m0.arc.arc
    28 424 154 textfiles.ultra.tar.arc

    I think the difference between 27 and 29 is not tiny, especially if you have a lot more files.

    If you can afford it, terabyte * number of cores would be great. Not many compression algorithms can benefit from such numbers, but if you were to design custom ones, this amount would rock on enwik10.
    LZMA and LZMA2 can benefit from IIRC 46 GB/thread.
    Since I have 4 core, and at least 4TB, how could I use them? and would the decompression need the same requirements?

    No, it's not algorithm, it's a program.
    FreeArc offers better strength/time ratio on average and some more features. It's less portable and less reliable. Has a more restrictive license.
    Yeah you're right, I've just seen that it uses different compression algorithms, depending on the compression level we choose.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > I have lots of questions since I began to interest to the
    > compression scene.

    I suppose you're only interested in practical existing tools,
    without any tweaking or programming involved?

    > Is arc -m0 better than tar, has it any advantages over tar

    File headers added by tar are relatively big - 512 bytes,
    while normally they're much smaller, and its possible to
    keep the archive index separately, instead of file headers.

    So for tar vs freearc -m0 its like this:
    + tar stores the files with fixed alignment, which can
    improve lzma compression of executables and other binaries.
    - tar headers still add more redundancy to archive,
    while modern archivers (7z/fa/etc) keep the archive structure
    as a solid compressed block, which improves overall compression.
    - freearc is able to do some "smart" file sorting (kinda like rar),
    and placing similar/duplicate files together can improve compression.
    + its harder to break a .tar archive, while .arc can become useless
    with a single modified byte.
    On other hand, freearc has explicit error recovery support, and
    .tar archives are usually distributed in compressed form anyway.

    But usually I'd prefer rar-m0 or 7z-m0 over freearc - rar because its
    relatively small and has useful features and 7z because it uses the
    same solid index at the end of archive, but more compact, and with
    an option to disable index compression (it can be compressed better
    by an external compressor).

    > If not, what is the best store format (with future compression
    > purpose, and important informations saving)?

    I'd say .rar.
    Also you can check this - http://encode.su/threads/1397-tar-replacement

    > One of the weird results I have seen is that tar > arc ultra is
    > better than directly arc ultra, weird isn't it?

    1. Misalignment of files. LZMA has a built-in alignment model,
    so the alignment of file start address in archive can affect
    the compression results.
    Normally having a fixed alignment for all files is the best,
    but it depends on specific data.
    2. Order of file placement in archive. Freearc reorders the files,
    while tar just follows the order in which files were originally created.
    But freearc's file sorting is a "blind" heuristic, it doesn't look
    at actual file contents, so its effect easily can be negative.

    > Regarding some results, is freearc ultra better than 7z LZMA2 or
    > PPMD, on text files? Despite the inverse results on some benchmarks?

    Freearc's lzma and ppmd are basically the same ones as in 7z, just used
    with different settings.
    Also this "ultra" doesn't have any specific meaning - its just a name of
    some configuration profile.
    So, afaik, it only makes sense to compare specific codec configurations.
    Although if you'd do, there'd be likely no difference with 7z or standalone
    codec builds.

    > What is the best algorithm/compressor to compress games?

    Modern games frequently already store their resources in compressed archives,
    so afaik its more a matter of ripper's ability to extract these.
    Also there're no good (re)compression implementations for most image/audio/video
    formats, so its more a matter of luck.

    > Compressing bin or iso makes any difference?

    .iso is basically an archive format similar to .tar, with large headers and
    file alignment, so the same points apply.
    Thus, normally its better to extract it to be able to apply file reordering
    and file-specific compression.

    > On what kind of file is lprepaq/precomp useful/efficient?

    I don't see any sense in using lprepaq, as its usually possible to get better
    results with precomp and external compressors.

    As to "kinds of files", what precomp does is expanding zlib (deflate) streams
    found in given files, to allow for better compression of the data with modern codecs.
    Also it finds embedded jpegs and compresses them with packjpg.
    Thus, any files which might contain embedded jpegs/pngs/zlib-compressed resources
    basically apply.

    > If I'd do a ram upgrade, how much ram would you advise me to get a
    > real benefit on compression, and then which compressor to make use
    > of all my ram?

    1. As you probably won't like having to use the same amount of RAM
    for extraction, the list of applicable compressors is actually pretty limited,
    and kinda only includes lzma.
    2. Its a fairly good idea to use a ramdisk for compression experiments
    (which usually involve bruteforce optimization of codec parameters)
    so installing maximum possible RAM seems good atm.

    > Why do most compression benchmarks use paq -7, instead of paq -8?
    > Because of a 1GB ram limitation?

    Actually it doesn't really matter that much, as unlike "dictionary size"
    options in LZ, the "model size" parameter in paq doesn't directly affect
    the volume of previously processed data which can be referenced, so its
    hard to find an example where -7 vs -8 would give a significant gain.
    Also paq is so slow that nobody cares anyway.

    > What's the differences between the different paq8px binaries (*.exe
    > , *_alt.exe, *_sse2.exe, *_sse2_alt.exe) ?

    Like compressors, there're also different compilers and different compiler options,
    which affect the speed of produced executables.
    And paq8 is slow, so instead of choosing the best build themselves, the developers
    frequently just post a few versions.

    > What is order-n content mixing? (I'm not bad at algebra, so don't
    > hesitate to use technical words if necesssary)

    Old statistical models (PPM,CM) only used N bytes of previously decoded data
    to predict the next byte. But having a statistical model for N-byte string matches
    usually involved having similar models also for strings of length 1..N-1
    as fallback (because longer matches are less frequent), thus order-N.

    Currently "order-N" just roughly means that a model only uses N last bytes
    for prediction, but the same model can use some other information, or, instead
    only include specific prefix lengths (eg. 1/2/4/8 instead of 0.. for speed,
    so different codecs with the same order-N tag can be completely different.

    > I am more and more interested about freearc over 7z, what are the
    > main advantages, is it any better? any advises?

    .7z doesn't have error recovery.
    Also freearc includes some Bulat's own codecs, and some lzma tweaks,
    which might make it better than 7z.

    > What is freearc algorithm?

    There's no such thing, as freearc is the name of a composite program.

    As to unique algorithms used in freearc, you can see some there -
    http://freearc.org/Research.aspx

  5. #5
    Member
    Join Date
    Nov 2011
    Location
    France
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thank you very much for you great answers.

    I suppose you're only interested in practical existing tools,
    without any tweaking or programming involved?
    You're right. For the moment, I'm only gathering information on this topic, through benchmarks, and personal tests, always trying to understand why is one tool better than another (for example freearc: it took me a few days to understand that it uses different algorithm). PAQ always fascinated me, and especially because of enwik8. I would say that my dream is to create a compressor that could achieve such compression, and maybe better (especally on lossy format). But I need more skills in programming, and in the subject, that's why I'm here.

    I'd say .rar.
    Also you can check this - http://encode.su/threads/1397-tar-replacement
    Thank you for this valuable link, I did not have the chance to see this post before.

    Modern games frequently already store their resources in compressed archives,
    so afaik its more a matter of ripper's ability to extract these.
    Also there're no good (re)compression implementations for most image/audio/video
    formats, so its more a matter of luck.
    I was sure game compression was no easy stuff. That's kinda interesting. I'm trying to use precomp the days, but it seems that it crashes a lot. A question about precomp: is it useful to use it on tar? Or should I directly point it to the maybe compressed file? How could I figure out myself if a file is already compressed?

    And paq8 is slow, so instead of choosing the best build themselves, the developers
    frequently just post a few versions.
    I didn't find any information on the difference of these binaries, except that sse2 is an extra set of machine instructions that can speed up compression on modern machines, but what is alt version?

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > I was sure game compression was no easy stuff. That's kinda interesting.

    There's enough complexity to dedicate whole sites to game extractors, like
    http://asmodean.reverse.net/

    > I'm trying to use precomp the days, but it seems that it crashes a lot.

    Precomp uses 3rd-party libs to work with jpeg and deflate, so its a little hard
    for its author to fix these bugs in the libs.
    You can try testing different precomp versions, disabling jpeg/gif recompression,
    excluding specific offsets where it crashes, etc.

    > A question about precomp: is it useful to use it on tar?

    Precomp only processes single files (its not an archiver), so if you want
    to process multiple files with it, then you need to somehow concatenate them first.
    So yes, sometimes it makes sense to process tar files with precomp,
    although tar is not the most efficient solution there.
    But its also possible to process files first, then put the .pcfs to archive.

    > Or should I directly point it to the maybe compressed file?

    For precomp there's no difference whether to process single files with it,
    or an archive with these files.
    But its also possible that the best order for compression of files would
    be only known after precomp.

    > How could I figure out myself if a file is already compressed?

    By looking at it in a binary viewer/editor?

    Also you can try compressing it - if there's no noticeable gain,
    then its likely already compressed. Or encrypted.

    > I didn't find any information on the difference of these binaries,
    > except that sse2 is an extra set of machine instructions that can
    > speed up compression on modern machines, but what is alt version?

    It doesn't really matter. You can test their speed and compressed sizes,
    and decide for yourself.

    Also, if you're really curious, you can try finding the paq8px thread
    on this forum, where that build was originally posted, and look for
    the exe specifics there.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Everything you always wanted to know about data compression. http://mattmahoney.net/dc/dce.html

  8. #8
    Member
    Join Date
    Nov 2011
    Location
    France
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thank you Matt for your link, I've already begun to read it since a few days, but there is a lot of content. I'm really glad that you let this documentation for everybody on the internet.

    So yes, sometimes it makes sense to process tar files with precomp,
    although tar is not the most efficient solution there.
    But its also possible to process files first, then put the .pcfs to archive.
    So based on your experience what would be the most efficient solution? You talked about rar on previous post? Is that the best solution?


    There's enough complexity to dedicate whole sites to game extractors, like
    http://asmodean.reverse.net/
    Wow, thanks for this link, but I don't really understand where is the documentation. Moreover, japanese is not helpful. If I have understood it the right way, it's a site that collects tools that decompress or decrypt game files, in order to be able to recompress them? Just like precomp, but not only with deflate?

    I'll search for the paq8px thread thank you.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > So based on your experience what would be the most efficient solution?

    Likely 7z -mx0 or 7z -mx0 -mhc=off

    > I don't really understand where is the documentation

    Well, there's likely none, but its kinda strange to expect any for stuff which is technically illegal anyway.

    > it's a site that collects tools that decompress or decrypt game files

    No, its a site where asmodean posts his reverse-engineered decoders for game archives of japanese games.
    You'd likely never see any of these games though, but apparently there were some interesting solutions,
    even BWT+ari somewhere.

    Anyway, I'm not into game ripping myself, so I don't know much about such sites, except maybe
    http://aluigi.altervista.org/quickbms.htm
    but I'm sure that google would help.

    > in order to be able to recompress them

    No, encoding for LZ is usually more complex than decoding, and lossless recompression is even harder than that,
    so finding all the necessary tools at once is pretty unlikely.

  10. #10
    Member
    Join Date
    Nov 2011
    Location
    France
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well, Thank you very for your answers, it's really helpful. I might come back with new questions, but I will definitely bring feedback on other posts ( ex: HCBF, I'm currently working on the compression of the 8 dvds of debian, making 31gb uncompressed )

  11. #11
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Well, there's likely none, but its kinda strange to expect any for stuff which is technically illegal anyway.
    Repacking games is illegal? Don't know how is it in Ukraine, but I don't think it is in Poland. Not 100% sure about it though.

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    I meant reverse-engineering.
    Not that I care whether its legal or not, but I do feel weird about people asking for documentation for decompiled sources.

  13. #13
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Reverse engineering is legal here as long as you do it during using a program legally, that's sure.
    ADDED:
    Well, there are some more limitations. You can reverse freely by observing the way it works, but decompile only for specific purposes.
    Last edited by m^2; 8th December 2011 at 02:51.

Similar Threads

  1. Greetings, Questions, and Benchmarks
    By musicdemon in forum Data Compression
    Replies: 4
    Last Post: 8th January 2012, 22:45
  2. New guy look for helps on naive questions
    By yacc in forum Data Compression
    Replies: 4
    Last Post: 1st January 2010, 18:39
  3. A recruit's compressor and some questions
    By Fu Siyuan in forum Data Compression
    Replies: 122
    Last Post: 23rd September 2009, 19:35
  4. Bunch of stupid questions
    By chornobyl in forum Data Compression
    Replies: 28
    Last Post: 6th December 2008, 18:26
  5. Data Distribution Questions.
    By Tribune in forum Data Compression
    Replies: 13
    Last Post: 25th June 2008, 19:09

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •