Page 1 of 85 1231151 ... LastLast
Results 1 to 30 of 2657

Thread: zpaq updates

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts

    zpaq updates

    http://mattmahoney.net/dc/#zpaq

    zpaq version 1.03 updates:
    - Uses mid.cfg as a default configuration file.
    - Does not store path names by default (just the file name). Use "r" command to override. 1.02 and earlier stored full paths by default.
    - Will not extract to absolute paths or paths containing "../" or "..\" unless you specify the destination during extraction. (Safety feature suggested by Yuri Grille).
    - No longer tries to recover from file open errors when compressing or extracting more than one file.
    - Won't trash an archive if you try to compress a nonexistent file.
    - Fixed "s" command to dump the whole header, which is more convenient only if you are writing a compressor.
    - Supports splitting a file into separately compressed segments ("k" command). Added support for this to unzpaq 1.03. It was described as a recommendation in the reference but never implemented. (I am planning a use for this ).

    There is no change in the ZPAQ spec, compression, or configuration file format, so there is no need to update any benchmarks. However I did manage to equal paq8px on the generic benchmark with a simple config file. http://mattmahoney.net/dc/uiq/

  2. Thanks (6):

    239 (12th April 2017),Alexander (6th June 2017),carlosnewmusic (3rd January 2019),h1127910 (26th September 2016),Simorq (28th March 2017),vladv (13th January 2021)

  3. #2
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Can't find compiled version. zpaq.exe missing in http://mattmahoney.net/dc/zpaq103.zip

  4. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts

  5. #4
    Member
    Join Date
    Aug 2009
    Location
    Bari
    Posts
    74
    Thanks
    1
    Thanked 1 Time in 1 Post
    wooow, I haven't never tested zpaq! Today I have tested it. Fantastic compression! it's superior than 7z! Takes only 278 mb RAM! Good. Why don't increase ram memory required?

  6. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Matt!

  7. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    zpaq is configurable. You describe the compression algorithm in a config file which is stored in the archive. You can do all the usual tradeoffs between speed, memory, and size, and use a custom algorithm for each file. I'm still playing around with it. I haven't quite beat paq8k2 on the generic test yet.
    http://mattmahoney.net/dc/uiq/

    The real purpose is to have a standard format that won't break every time you find a better algorithm. It is based on a PAQ like architecture where you can arrange the components how you like and specify arbitrary contexts and preprocessing steps in a hard to program language called ZPAQL. Your new algorithm will still decompress with older versions of the decompressor.

    Of course a good compressor should do this automatically. It examines the file and picks the best algorithm for it. But I'm not quite there yet.

  8. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Well, I certainly hope zpaq will be around. I wrote a specification and an open source compressor with that idea in mind. I suspect that in 10 years that there will be various improvements (level 2, 3, etc) but the spec requires them to be backward compatible. Already I can think of some improvements I would have made:

    - I discovered that a few of the 255 states used in the ICM and ISSE are not reachable.
    - The MIX could be a bit faster by bounding the weights every byte instead of every bit.

    But the improvements would probably be very minor and not worth changing. The biggest limitation is probably that it is designed to be efficient on 32 bit machines. In 10 years, everything will be 64 bits. No component can use more than 1 GB memory. Well, the spec would allow it, but because contexts are computed as 32 bit values, the extra memory would not be very useful.

    It takes a long time for a standard to become well accepted. ZIP (deflate) is well accepted even though it was designed for 16 bit machines and the compression is pretty poor. The support comes from an RFC, a spec that is not too complex, an open source library (zlib) that allows free commercial use, and support by many independent developers including free and open source compressors.

    ZPAQ has a more complex format, but it needs to be to get PAQ-like compression. However, I think it has a much better chance than various PAQ based programs where each new version breaks compatibility, or the many closed source programs with undocumented formats that could go away whenever the author gets bored with it.

  9. #8
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Well, I certainly hope zpaq will be around. I wrote a specification and an open source compressor with that idea in mind. I suspect that in 10 years that there will be various improvements (level 2, 3, etc) but the spec requires them to be backward compatible. Already I can think of some improvements I would have made:

    - I discovered that a few of the 255 states used in the ICM and ISSE are not reachable.
    - The MIX could be a bit faster by bounding the weights every byte instead of every bit.

    But the improvements would probably be very minor and not worth changing. The biggest limitation is probably that it is designed to be efficient on 32 bit machines. In 10 years, everything will be 64 bits. No component can use more than 1 GB memory. Well, the spec would allow it, but because contexts are computed as 32 bit values, the extra memory would not be very useful.

    It takes a long time for a standard to become well accepted. ZIP (deflate) is well accepted even though it was designed for 16 bit machines and the compression is pretty poor. The support comes from an RFC, a spec that is not too complex, an open source library (zlib) that allows free commercial use, and support by many independent developers including free and open source compressors.

    ZPAQ has a more complex format, but it needs to be to get PAQ-like compression. However, I think it has a much better chance than various PAQ based programs where each new version breaks compatibility, or the many closed source programs with undocumented formats that could go away whenever the author gets bored with it.
    Honestly, I don't think that zpaq will ever become popular, it has to little to offer.
    But anyway I think that forward compatibility is a real kicker. I certainly hope that the next major archive format will do it in a similar way.

  10. #9
    Member
    Join Date
    Dec 2015
    Location
    London UK
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by m^2 View Post
    Honestly, I don't think that zpaq will ever become popular, it has to little to offer.
    But anyway I think that forward compatibility is a real kicker. I certainly hope that the next major archive format will do it in a similar way.
    I hope that it will become popular, because I don't agree that it has little to offer. I think it offers outstanding compression compared to 7-zip and most others. Also the new features are interesting; some remind me of dar which is another archiver I like.

    One thing I'd like to ask zpaq is to avoid changing the command line syntax and the list output format with every other release. I have compared syntax and output format for many versions and it's not clear to me why it changes so much. Another thing I'd like to see is for the maximum compression compressed files to decompress faster and use less memory for decompression, though I understand why that may not be possible.

    Another thing I'd like to see is for the zpaq format to be supported by the 7-zip explorer just as it supports zip and xz. I know this isn't something I need to ask zpaq (and I bet zpaq would not oppose this integration), rather 7-zip. But I think since plenty of people use 7-zip already, being able to use zpaq through 7-zip would help popularising it. I know some features may not be possible with the current 7-zip explorer but those could be disabled. And I know Peazip supports zpaq already but I don't think that many people use Peazip (by the way I knew zpaq from Peazip and I do use it).

  11. #10
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts
    Quote Originally Posted by jchevali View Post
    I hope that it will become popular, because I don't agree that it has little to offer. I think it offers outstanding compression compared to 7-zip and most others. Also the new features are interesting; some remind me of dar which is another archiver I like.

    One thing I'd like to ask zpaq is to avoid changing the command line syntax and the list output format with every other release. I have compared syntax and output format for many versions and it's not clear to me why it changes so much. Another thing I'd like to see is for the maximum compression compressed files to decompress faster and use less memory for decompression, though I understand why that may not be possible.

    Another thing I'd like to see is for the zpaq format to be supported by the 7-zip explorer just as it supports zip and xz. I know this isn't something I need to ask zpaq (and I bet zpaq would not oppose this integration), rather 7-zip. But I think since plenty of people use 7-zip already, being able to use zpaq through 7-zip would help popularising it. I know some features may not be possible with the current 7-zip explorer but those could be disabled. And I know Peazip supports zpaq already but I don't think that many people use Peazip (by the way I knew zpaq from Peazip and I do use it).
    i also think zpaq will not become popular but not because it has too little to offer because what it offers it does it really fine rather than bad/no marketing/irregular updates etc.

  12. #11
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Well, I certainly hope zpaq will be around. I wrote a specification and an open source compressor with that idea in mind. I suspect that in 10 years that there will be various improvements (level 2, 3, etc) but the spec requires them to be backward compatible. Already I can think of some improvements I would have made:

    - I discovered that a few of the 255 states used in the ICM and ISSE are not reachable.
    - The MIX could be a bit faster by bounding the weights every byte instead of every bit.

    But the improvements would probably be very minor and not worth changing. The biggest limitation is probably that it is designed to be efficient on 32 bit machines. In 10 years, everything will be 64 bits. No component can use more than 1 GB memory. Well, the spec would allow it, but because contexts are computed as 32 bit values, the extra memory would not be very useful.

    It takes a long time for a standard to become well accepted. ZIP (deflate) is well accepted even though it was designed for 16 bit machines and the compression is pretty poor. The support comes from an RFC, a spec that is not too complex, an open source library (zlib) that allows free commercial use, and support by many independent developers including free and open source compressors.

    ZPAQ has a more complex format, but it needs to be to get PAQ-like compression. However, I think it has a much better chance than various PAQ based programs where each new version breaks compatibility, or the many closed source programs with undocumented formats that could go away whenever the author gets bored with it.
    There were discussion on this forum about how FP math depends on the compiler. Maybe it would be better to use some software FP library (in the refernce decoder.)
    Also ZPAQL is simply awful. Registers "behave" differently. And I can't understand how to implement some sort of subroutine.

  13. #12
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    @Matt Mahoney

    would it be possible with zpaq to implement such a compression-method like ST5 from the new bsc 2.1.5 / or ppmd from 7-zip ?

    for me ST5 (bsc) has very good results for text files (it does of course not compress so strong like paq8o8 but it works fast ...)
    best regards

  14. #13
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    That's an interesting question; zpaq seems to be defining the parameters for CM for statistic predictors, and not a more general VM language in which, say, LZ or other schemes could be described?

  15. #14
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by willvarfar View Post
    That's an interesting question; zpaq seems to be defining the parameters for CM for statistic predictors, and not a more general VM language in which, say, LZ or other schemes could be described?
    zpaq defines a postprocessor to use after decoding; but ZPAQL, the VM language, is full of arbitrary rules and lacks even an indirect jump instruction.

  16. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    @isname, ZPAQ format does not depend on any floating point math. The specification defines the exact format. The squash() and stretch() functions are defined mathematically and implemented using log() and exp() functions to generate lookup tables. However I include a checksum in the code to make sure there are no roundoff differences, and I checked that there are no values near a roundoff boundary (except squash(0) that I specified explicitly). The spec defines what the checksum should be so you can verify your implementation.

    The ZPAQL instruction set is designed for high speed and small size, not for ease of programming. It is reasonably fast when interpreted, and can also be translated into C/C++ without too much difficulty. However, an indirect jump instruction does not translate easily. I thought about adding call and return instructions, but it also has similar difficulties. ZPAQL programs are supposed to be small. Ultimately I want to add a JIT compiler so you don't need to install a separate C++ compiler to get the best performance.

    @joerg, zpaq has a BWT configuration written as a postprocessor in ZPAQL by Jan Ondrus. (bwt_j3 at http://mattmahoney.net/dc/#zpaq ). There is also bwt_slowmodel which uses only 1.25n block size like BBB. It would be possible to write a ST5 postprocessor as well. I probably won't because it is patented in the U.S. You could also write LZ77, LZW, or LZP as a postprocessor. (min.cfg uses LZP).

    @m^2, what zpaq offers is high compression ratio. See http://mattmahoney.net/dc/text.html and http://www.maximumcompression.com/data/summary_sf.php and compare with zip. I know zip is faster. Whether zpaq ever becomes a standard I don't know. It will take many years in any case, just like it did for deflate and ppmd. I do know that closed source and undocumented formats have no chance at all, and even open source projects like PAQ don't have much chance if every new version is not compatible with the one before it.
    Last edited by Matt Mahoney; 14th June 2010 at 23:31.

  17. #16
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Matt Mahoney View Post
    @m^2, what zpaq offers is high compression ratio. See http://mattmahoney.net/dc/text.html and http://www.maximumcompression.com/data/summary_sf.php and compare with zip. I know zip is faster. Whether zpaq ever becomes a standard I don't know. It will take many years in any case, just like it did for deflate and ppmd. I do know that closed source and undocumented formats have no chance at all, and even open source projects like PAQ don't have much chance if every new version is not compatible with the one before it.
    I don't think it's right to compare with zip, while it is the standard, technically it's very outdated.
    Strength in not unique for ZPAQ. And even for it's strength, ZPAQ is slow. Your own LTCB and Nanozip make a nice (though probably worse than average) example. JIT will certainly help and I can't wait to see how much...though if I were to implement it, it wouldn't be JIT, but a regular compiler, for the sake of simplicity.
    And ZPAQ file format has no features at all. It's actually even worse than zip in this regard, because as far as I see, there's no support for encryption.

    For the lack of features, I actually don't wish ZPAQ well. Though I'd love to see the technology somewhere.

  18. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    > Strength in not unique for ZPAQ. And even for it's strength, ZPAQ is slow.

    See also results on calgary.tar at http://mattmahoney.net/dc/dce.html#Section_214
    zpaq is on the Pareto frontier, but zip/gzip is not.

    Also, although deflate is a standard used by both zip and gzip, they are not compatible with each other because one is for archives and the other is for files. I guess you can say that zip is a de facto standard for archives as long as you don't add too many features like ppmd, jpeg compression, encryption, etc.

    Current implementations of zpaq do not have as many features as zip but that is not a problem with the specification. Currently I don't have a version that preserves timestamps or permissions, but if you need that you can use tar and then compress it. I did put a recommendation for preserving timestamps in the comment field which I might implement later.

    I don't think that encryption is an important feature for an archiver. If you really want to protect your data, you should use whole disk encryption like truecrypt. When you encrypt a file and delete the original, there is still a copy in your deleted sectors and maybe your swap file, and who knows where else. IMHO the only real use for archive encryption is for sending viruses.

    But my complaint is that there are thousands of incompatible format. Look at the list of compressors at http://mattmahoney.net/dc/text.html and tell me which do you think will be around in 10 years when you need to decompress some old files.

  19. #18
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Matt Mahoney View Post
    > Strength in not unique for ZPAQ. And even for it's strength, ZPAQ is slow.

    See also results on calgary.tar at http://mattmahoney.net/dc/dce.html#Section_214
    zpaq is on the Pareto frontier, but zip/gzip is not.

    Also, although deflate is a standard used by both zip and gzip, they are not compatible with each other because one is for archives and the other is for files. I guess you can say that zip is a de facto standard for archives as long as you don't add too many features like ppmd, jpeg compression, encryption, etc.

    Current implementations of zpaq do not have as many features as zip but that is not a problem with the specification. Currently I don't have a version that preserves timestamps or permissions, but if you need that you can use tar and then compress it. I did put a recommendation for preserving timestamps in the comment field which I might implement later.

    I don't think that encryption is an important feature for an archiver. If you really want to protect your data, you should use whole disk encryption like truecrypt. When you encrypt a file and delete the original, there is still a copy in your deleted sectors and maybe your swap file, and who knows where else. IMHO the only real use for archive encryption is for sending viruses.

    But my complaint is that there are thousands of incompatible format. Look at the list of compressors at http://mattmahoney.net/dc/text.html and tell me which do you think will be around in 10 years when you need to decompress some old files.
    Please, don't compare strength to zip, everybody here knows that deflate is a dinosaur.
    Compare zpaq to FreeArc.
    IIRC zpaq is stronger, but not much. It's much slower, has no GUI and practically no features. Possibly offers more forward compatibility (*)
    Compare to 7zip.
    IMO it's the most likely to be the next king of the hill, because it was the first open format significantly stronger than zip and already has considerable user base.

    When it comes to features, I actually thought zpaq is on par with zip, which I considered to be the bottom, then I noticed that there's encryption and no way to add it w/out breaking compatibility.
    I don't have my opinion on how important it is, but I can tell you one other use of it:
    I know a guy who hides pictures of his (ex? don't know much details) lovers to protect them from his wife.

    You say you may add more features.... You're welcome to do so.

    Ad (*): I have a question: is it possible to extend the ZPAQL to add more more features (I'm thinking about multithreading or stream computing) w/out breaking compatibility? Or maybe it's is there? I didn't take a close look.
    If no, then IMO it's not really future proof anyway.

  20. #19
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    252
    Thanks
    49
    Thanked 107 Times in 54 Posts
    Quote Originally Posted by Matt Mahoney View Post
    See also results on calgary.tar at http://mattmahoney.net/dc/dce.html#Section_214
    zpaq is on the Pareto frontier, but zip/gzip is not.
    Code:
    calgary.tar   CT    DT    Program         Options    Algorithm Year Author
    ----------- ----- -----   -------         -------    --------- ---- ------
      595,533   411   401     paq8l           -6         CM        2007 Matt Mahoney
      598,983   462   463     paq8px_v67      -6         CM        2009 Jan Ondrus
      605,656   182   197     paq8f           -6         CM        2005 Matt Mahoney
      610,871   549   528     paqar 4.1       -6         CM        2006 Alexander Rhatushnyak
      644,190   19.9  19.9    zpaq 1.10       ocmax.cfg  CM        2009 Matt Mahoney
      647,110   140   138     paq6            -8         CM        2003 Matt Mahoney
      647,440   7.6   7.3     nanozip 0.07a   -cc        CM        2009 Sami Runsas
      ........
      699,191   7.6   7.4     zpaq 1.10       ocmid.cfg  CM        2009 Matt Mahoney
    zpaq with 'ocmax.cfg' is probably on the Pareto frontier (who knows how it would look like if memory usage and sizes of decompression programs were considered), but the fact that paq8px_v67 is worse than paq8l shows that engineers who were improving paq8l chose to ignore the worsening on calgary.tar
    Last edited by Alexander Rhatushnyak; 16th June 2010 at 03:54.

  21. #20
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by Alexander Rhatushnyak View Post
    paq8px_v67 is worse than paq8l shows that engineers who were improving paq8l chose to ignore the worsening on calgary.tar
    Reason for this is removed model for "pic" file from Calgary corpus. Older versions of paq had specialized model for it.

    paq8l -7 513216 -> 22541
    paq8px_v68 -7 513216 -> 29960

    - FAX. For 2-level bitmapped images. Contexts are the surrounding
    pixels already seen. Image width is assumed to be 1728 bits (as
    in calgary/pic).

  22. #21
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    @Matt Mahoney: how can such thing patented in the U.S. - if it is open source (GPL) as it is written in source of bsc ?

    best regards

  23. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Quote Originally Posted by joerg View Post
    @Matt Mahoney: how can such thing patented in the U.S. - if it is open source (GPL) as it is written in source of bsc ?

    best regards
    http://www.compressconsult.com/st/

  24. #23
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    @Matt Mahoney: thank you for your quick answer and the link to http://www.compressconsult.com/st/

    first look: "This algorithm is protected by US patent 6,199,064, others pending."

    the link: http://164.195.100.11/netacgi/
    nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=%276,199,064%27.WKU.&O S=PN/6,199,064&RS=PN/6,199,064

    is for me not visible

    for me - it seems that on the website the transformation is described only to order 4

    bsc introduces schindler transformation order 5, which seems to be a big step to better compression ...

    best regards

  25. #24
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    http://www.google.com/patents?vid=USPAT6199064

    Edit: after reading the patent, it seems from the claims that only the sort transform is patented, but not the inverse transform. The patent does describe a few variations of the inverse transform but makes no claims to any of them. There are 24 claims, all on the forward transform.

    The claims don't specify any order, so all order transforms would be covered. In fact there is an interesting variation described for compressing images where the pixels (or prediction errors) are sorted by context of neighboring pixels in a hierarchical fashion in order of increasing resolution.

    I am not a lawyer, but it seems to me if you published and used a ZPAQ preprocessor that used a sort transform in a country where it is not patented or that doesn't recognize software patents (I think all of the EU), then it should be legal to decode archives in any country. The ZPAQ archive would contain ZPAQL code for the inverse transform, but there is no patent on that.
    Last edited by Matt Mahoney; 15th June 2010 at 19:42.

  26. #25
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    FreeArc is a GUI wrapper around several command line compressors. Maybe if ZPAQ becomes a widely used format, Bulat might add it. Likewise for 7zip, Winzip, etc. I prefer to work on the engine. There is no reason that an archive in ZPAQ format could not support all the features of these other programs. If you want encryption, you create an archive and encrypt the whole thing with a password. You could even mix encrypted and unencrypted blocks if you use locator tags so that a standard decompresser could find the unencrypted blocks.

    It is possible with ZPAQ to compress and decompress blocks independently, so each block could be run in a separate thread. I might do this in a future version. Decompression is fairly straightforward. For compression, you would have to compress the blocks independently to temporary files or memory and then concatenate them when all done. This has the same tradeoffs as other algorithms. Dividing the input into independent blocks makes compression worse. You need to divide up memory among threads.

    > I know a guy who hides pictures of his (ex? don't know much details) lovers to protect them from his wife.

    He would be better off putting a password on his computer. But I think he will still be caught eventually

  27. #26
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Most of the improvements from paq8l to paq8px_v67 are on special file types like bmp, jpeg, tiff, and exe. These are not in the Calgary corpus.

    None of the programs listed in my calgary.tar benchmark have dictionaries, so program size is not a big factor. You can see some results for these at http://mattmahoney.net/dc/paq.html which are marked with D (for example, pasqdacc 4.3c -7 = 567688 bytes using a dictionary tuned to the Calgary corpus). Also, the corpus is small enough that memory usage does not make much difference above a few hundred MB. It makes a much bigger difference when available memory is comparable to the data size, like with enwik9 in LTCB.

    In any case, unzpaq.exe is 15,360 bytes, which is smaller than most of the other programs.

    The latest versions of paq do much better than earlier ones in the maximum compression benchmark. http://www.maximumcompression.com/data/summary_sf.php

    However, zpaq was only tested with max.cfg, and not with the specialized models written by Jan Ondrus which could do much better. For example:

    zpaq ocbmp_j4.cfg maxcomp\rafale.bmp -> 522029 (beating paq8px) in 34 sec.
    zpaq ocexe_j1.cfg maxcomp\acrord.exe -> 1084657 in 29 sec.
    zpaq ocexe_j1.cfg maxcomp\mso97.dll -> 1464603 in 27 sec.
    zpaq ocjpg_test1.cfg maxcomp\a10.jpg -> 716043 in 25 sec.

    Adding the 6 other published results gives a total size of 9892940 which would put it in 6'th place (just ahead of nanozip 0.07) instead of the current 11'th place. If I was not so lazy I might write specialized models for the other files. For example, ohs.doc could be split into 3 blocks so that the big embedded JPEG image could be compressed with the JPEG model. Also, with a lot of work it would be possible to write something equivalent to precomp in ZPAQL. Note that

    precomp flashmx.pdf | zpaq ocmax.cfg -> 2186726

    which is far smaller than the best result on this file. This would also make the total size 8455038 which would give it the number 1 spot.
    Last edited by Matt Mahoney; 16th June 2010 at 06:15.

  28. #27
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Not yet, but I am working on a text preprocessor where decompression will probably be faster than compression. "zp c1" or "zpaq ocfast.cfg" is about 1.2 MB/sec (symmetric) on my 2 GHz T3200 laptop. It uses only 2 models, an order 2 ICM and order 4 ISSE with 38 MB memory.

    calgary.tar -> 806958 (2.6 sec)
    enwik8 -> 24837469 (82 sec)

    Code:
    comp 1 2 0 0 2 (hh hm ph pm n)
      0 icm 16    (order 2)
      1 isse 19 0 (order 4)
    hcomp
      *b=a a=0 (save in rotating buffer M)
      d=0 hash b-- hash *d=a
      d++ b-- hash b-- hash *d=a
      halt
    post
      0
    end

  29. #28
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    I wrote a Wikipedia article on ZPAQ. Feel free to comment, discuss, edit, etc. http://en.wikipedia.org/wiki/ZPAQ

    Also, an article on context mixing, which used to be a stub: http://en.wikipedia.org/wiki/Context_mixing
    Last edited by Matt Mahoney; 9th September 2010 at 04:13.

  30. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    Thanks.

    A random comment: the more common "Cache aligned" bit mapping is not like you described,
    but a more compact one - cells 0..14 are allocated for the first 3 bits, and then cells (h+1)*15-1+l
    for the next bits (h is the high nibble value, l is partial low nibble prefixed with 1).
    Its used in ccm, a few coders on ctxmodel, and afair older m1 versions.

    Also note that ccm uses a lookup table x=t[x][bit] to update the counter table index
    (though its not certain that its better than a single *15 rebase per byte),
    and toffer described an interesting optimization, where subtables corresponding to low
    nibbles are dynamically reordered (with bytewise counters there'd be 3 extra low-nibble
    tables in the same cache line with high nibble table).

  31. #30
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    That's an interesting idea. Packing 60 bit histories into a cache line for the most common high nibbles would save a cache miss most of the time but it would cost extra cache misses during hash table lookup misses.

Page 1 of 85 1231151 ... LastLast

Similar Threads

  1. ZPAQ self extracting archives
    By Matt Mahoney in forum Data Compression
    Replies: 31
    Last Post: 17th April 2014, 04:39
  2. ZPAQ 1.05 preview
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 30th September 2009, 05:26
  3. zpaq 1.02 update
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 10th July 2009, 01:55
  4. Metacompressor.com benchmark updates
    By Sportman in forum Data Compression
    Replies: 79
    Last Post: 22nd April 2009, 04:24
  5. ZPAQ pre-release
    By Matt Mahoney in forum Data Compression
    Replies: 54
    Last Post: 23rd March 2009, 03:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •