Page 5 of 6 FirstFirst ... 3456 LastLast
Results 121 to 150 of 153

Thread: pcompress, a deduplication/compression utility

  1. #121
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The binaries in buildtmp are v3.0, not v3.1. I wondered why there wasn't any improvement.

  2. #122
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The binaries in buildtmp are v3.0, not v3.1. I wondered why there wasn't any improvement.
    Fixed yet? Many (most?) people here don't want to (or can't!) mess with compiling linux sources - be good to keep the binary/version nos updated after every new source code update. In the continued absence of the win version that'd be best way to get feedback on pcompress.

    Also re the avx2 haswell issue, can't it be compiled with the fastest option & then if avx2 not available the binary uses alternate code.

    J

  3. #123
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    Fixed the download, it was my oversight. Now the binaries are built on Ubuntu with AVX disabled, so it should run on non-Haswell processors. Also the binaries are now 3.1 version.

    I have written explicit code in Pcompress where multiple hand-optimized (ASM or Intrinsics) variants exist for different processor capabilities and it detects the processor at runtime. This is called CPU dispatching. See for example the AES stuff which checks for AES-NI or Vector (VPAES) instruction capability:
    https://github.com/moinakg/pcompress...ypto_aes.c#L81

    This is the code which probes processor capabilities using the cpuid instruction at runtime:
    https://github.com/moinakg/pcompress/blob/master/utils/cpuid.c

    However the Gcc compiler can auto-vectorize normal C code loops and arithmetic computations. In that case it just uses the max processor capability that has been specified (via optimization flags) and generates machine code for that. I think there is a Gcc feature under development (or in bleeding edge versions) that will generate multiple CPU type variants and generate additional checks to do automatic cpu dispatching. The stable version in Ubuntu does not appear to have that capability.

  4. #124
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I still get "illegal instruction" with the 3.1 Ubuntu binary. I'm testing on a Core i7 M620 (has sse4.2).

  5. #125
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    Strange. Let me check.

  6. #126
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    I rebuilt everything with "-march=core2" flag and uploaded new binaries. It should work now.

  7. #127
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    This one works. Nice improvement on the 10GB benchmark (system 4). http://mattmahoney.net/dc/10gb.html

    I noticed some minor problems. Last-modified times were not restored on empty directories. Also, the download pc/pcompress script can't find "/home/moinakg" although I figured out how to install it anyway.

  8. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    moinakg (7th February 2015)

  9. #128
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  10. #129
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    I am using OpenSUSE 13.1 x64 and copied the binaries on the same place where I used to start the tests of Pcompress 3.0.
    But if I write "./pcompress -a -l14 TEST_APP app.pz" in the terminal pcompress is running with just 300k of memory, one CPU is used
    permanently but no file is being created. What am I doing wrong?

  11. #130
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    Binaries built for Ubuntu may not work on OpenSUSE. It is probably going into some infinite loop early on. I will build binaries for OpenSUSE and provide a download in a couple of days.

  12. #131
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts

  13. #132
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    Thank you for providing the executable.
    But I am afraid the same happens on my OpenSUSE 13.1 (x64) - just a loop and no archive is created.
    Used commandline is time ./pcompress -a -l14 TEST_APP app.pz

    I edited the batch file:
    #!/bin/sh

    PC_PATH="/home/stephan/Testsets"
    LD_LIBRARY_PATH="${PC_PATH}"
    export LD_LIBRARY_PATH

    exec ${PC_PATH}/pcompress "$@"
    The testsets folder is where the testsets and your compile with batch files are located.

  14. #133
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    I have updated the tarball in the same localtion. It was my mistake. I forgot to include the pcompress binary inside "buildtmp" dir. If you delete that "buildtmp" dir then the script goes into an infinite loop of calling "exec" to itself.

    After extracting this tarball you only have to edit the PC_PATH variable.

  15. The Following User Says Thank You to moinakg For This Useful Post:

    Stephan Busch (27th February 2015)

  16. #134
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    this build finally works.
    I have put results onlnie.. we have mixed results compared to previous version;
    we lose much compression on Camera, PGM/PPM and XML testsets.
    No audio compression / delta compression seems to have been applied.

  17. #135
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Re latest sqeeze chart results, said to be 'mixed' compared with previous version any idea why? J

  18. #136
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    some results are better than 3.0 and some are worse..

  19. #137
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    Have to check. Some of the newer transforms may not be playing well in all cases or some thresholds need to be tweaked.

  20. #138
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I would like to test it directly (WCC) using pcompress as main OS windows. Is there any software that allows me to emulate a linux executable on windows?

  21. #139
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Nothing like wine- but if you have a reasonably modern pc you can use virtualbox - free, works well on win 64 bit with few cores & n gbytes memory, fair performance. I use it all the time eg for running ubuntu on win 7. It'll run other linuxes too.

    Of course we'd really like the fabled & delayed win port. Anyone know why, with moinakg being a linux/c++ and other computer stuff guru (IMO) it isn't really easy for him to port to windows - an afternoon's work to make a beta? Surely there is only memory access & file io plus maybe some os specifics like finding no of cores - rest is C++? Only thing I can think of is that pcompress has a rather large no of separate files, and maybe uses linux libraries such as wavepack & maybe many others - these could be omitted from beta. j

  22. The Following User Says Thank You to avitar For This Useful Post:

    Nania Francesco (2nd March 2015)

  23. #140
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Having developed zpaq for Windows and Linux, I can tell you that supporting both OS's is not something you can fix in a day. For a single file compressor and single threading, yes. For an archiver, no.

  24. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    avitar (2nd March 2015),Nania Francesco (2nd March 2015)

  25. #141
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Matt, yes - maybe I should have said an alpha version to get started, in an afternoon... but I yield to your superior knowledge. I was of course trying to be a bit controversial - anyway it shouldn't take more than few days! J

  26. #142
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    ok, just find these few days yourself

  27. #143
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Don't be silly, not few days for a non compression, c++, linux, windows guru but a few 100 I suspect! j

  28. #144
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    ok, you have one more week to become a guru. i believe it should be easy for smart guy like you!

  29. #145
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Stop misrepresenting what I say - as we all know to be a guru takes a lifetime. For the author, a few days I still think - I accept that free time is difficult to find for all of us. Still can't understand why it should take so long.

    BTW while on this related subject, is there a 64 bit port of freearc? - surely that'd take less than few days, for the the guru ...

  30. #146
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Just saying pcompress source code is about 500 files. Doing anything at all to it won't be simple.

    In zpaq to handle Windows and Linux I need separate code for multi-threading, directory traversal and creation, file deletion, reading and setting file dates and attributes and permissions, handling Unicode filenames, detecting number of processors, memory, and system time, error reporting, reading from the terminal, getting file sizes and seeking over 2 GB, random number generation, and interfacing to assembler. The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.

    The compression and encryption code is about the only thing that is not OS dependent. In pcompress, most of the compression comes from other libraries (zlib, lzma, bsc, bzip2, lz4, lzfx) and likewise for encryption. So basically, a Windows port would be like writing the whole thing all over again. I think it is safe to say that it won't happen.

  31. #147
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.
    This seems like it should be a solved problem. For unix-alike systems, tar format is pretty much standard and supports all kinds of filesystem metadata. It is incorporated wholesale into common unix compressed formats, like .tar.gz, .tar.bz2, etc. Incorporating tar into pcompress or zpaq archives might not be feasible for some reason, but it would be a good idea to emulate tar's behavior wherever possible. I'm not sure if there's anything like a tar-equivalent for Windows.

    To the extent that this problem remains unsolved, you have to wonder if perhaps it's not a good idea.

  32. #148
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yes, there is GNU tar for Windows, which solves many of these problems.

  33. #149
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    and btw, why don't use an FB, except for its early alpha status? i think it has the same main feature as pcompress - join main files toghether and run dedup+compression... although OTOH zpaq is even better
    Last edited by Bulat Ziganshin; 5th March 2015 at 12:24.

  34. #150
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 37 Times in 22 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Just saying pcompress source code is about 500 files. Doing anything at all to it won't be simple.

    In zpaq to handle Windows and Linux I need separate code for multi-threading, directory traversal and creation, file deletion, reading and setting file dates and attributes and permissions, handling Unicode filenames, detecting number of processors, memory, and system time, error reporting, reading from the terminal, getting file sizes and seeking over 2 GB, random number generation, and interfacing to assembler. The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.

    The compression and encryption code is about the only thing that is not OS dependent. In pcompress, most of the compression comes from other libraries (zlib, lzma, bsc, bzip2, lz4, lzfx) and likewise for encryption. So basically, a Windows port would be like writing the whole thing all over again. I think it is safe to say that it won't happen.
    Thanks for clarifying the details. The system-specific stuff are the bulk of the porting work that needs to be done. However, in Pcompress I am using a modified copy of Libarchive for archiving in pax-extended format (superseeds tar format) which already handles all the archiving issues in a platform neutral way. Libarchive builds and works on Windows, so I do not have much of a trouble on that front.

    The trouble I can see are the assembler code in the bundled crypto routines (AES, Salsa20, BLAKE2, SHA2). Yasm works on windows but I will have to build using my patched version of Yasm. All the crypto routines and most of the compression code are in the source tree. The only external dependencies involve WavPack, Zlib and Bzip2. So yes, there is a bit of work involved but not quite the massive amount that you mentioned.

Page 5 of 6 FirstFirst ... 3456 LastLast

Similar Threads

  1. Data deduplication
    By Lasse Reinhold in forum Data Compression
    Replies: 79
    Last Post: 18th November 2013, 07:49
  2. native deduplication in Windows 8 x64
    By jimbow in forum Data Compression
    Replies: 6
    Last Post: 30th October 2012, 23:57
  3. A Microsoft study on deduplication
    By m^2 in forum Data Compression
    Replies: 1
    Last Post: 5th May 2011, 18:15
  4. Remote diff utility
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 6th September 2009, 15:37

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •