Page 3 of 19 FirstFirst 1234513 ... LastLast
Results 61 to 90 of 555

Thread: FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)

  1. #61
    Member
    Join Date
    Jun 2008
    Location
    L?vis, Canada
    Posts
    30
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think that there is a need for a practical LZ77 achiver. For example, Debian packages are an ar archive containing a file with the control data and a .tar.gz archive containing the software. A practical archiver could allow a mix of independent and solid storage, so only one unarchiver would be needed for software packages. It could also be used for HTML files on the internet.

    I would envision it as LZ77 + range coding and some Huffman modes. One mode could be almost as fast as LZJB for CPU-bound servers.

  2. #62
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    Quote Originally Posted by Bulat Ziganshin View Post
    jo.henke, why not make new compile script based on your suggestions?
    Ok, I will create a single build script, which should cover all possible parameter combinations.

  3. #63
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Mihai Cartoaje View Post
    I think that there is a need for a practical LZ77 achiver. For example, Debian packages are an ar archive containing a file with the control data and a .tar.gz archive containing the software. A practical archiver could allow a mix of independent and solid storage, so only one unarchiver would be needed for software packages. It could also be used for HTML files on the internet.

    I would envision it as LZ77 + range coding and some Huffman modes. One mode could be almost as fast as LZJB for CPU-bound servers.
    I just tried LZJB. It's very fast, but extremely weak.
    It's slightly slower than QuickLZ -0.
    Sizes?
    Bookstar:
    32258980 vs 21921597
    TCUP:
    188834800 B vs 154068684 B

    No wonder there's no standalone tool, people would bash Sun for using it.

  4. #64
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    Quote Originally Posted by Mihai Cartoaje View Post
    I think that there is a need for a practical LZ77 achiver
    such as rar, p7zip or freearc?

  5. #65
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    Quote Originally Posted by jo.henke View Post
    Ok, I will create a single build script, which should cover all possible parameter combinations.
    In the attachment is a first version that should be fine for now. It can be taken as a replacement for all current compile* scripts, thus simplifying maintenance. I left out system specific parameters, so to compile as before, call:
    Code:
    ./build.sh -flags='-march=i486 -mtune=pentiumpro'
    Try './build.sh -h' for all options.
    Attached Files Attached Files
    Last edited by jo.henke; 14th January 2009 at 18:24. Reason: updated version

  6. #66
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    Quote Originally Posted by m^2 View Post
    3 t0: 6.2-6.8s.
    3 t1: ~21s.
    can you test this situation more?

    1) check with attached tor.exe
    2) check with other binary files, in particular dll from http://www.haskell.org/bz/testfile.7z (upload will be ашатшырув 30 minutes later) ?
    Attached Files Attached Files

  7. #67
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Best measured CPU time: 18.531, usually ~19.

    On your dll t0 and t1 are within error margin, both CPU and global times.

    As I look into my tests (6 involve tor 0.5a, done with older versions), -4 is always 2-3 times slower than -3 though (global times), so you found a really special file.

    Note for self: The results are on XSOS.

  8. #68
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    >so you found a really special file.

    it's the worst file i've found. i will try to download cup..

  9. #69
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    Bulat, could you please apply the attached patch (full files are also included).

    I did some cleanup by completely moving the endian handling to Common.h - there, I added PowerPC specific instructions, which directly load/store little endian data. On PowerPC, this gives some speedup and a smaller binary.

    Tests show the same results as before.
    Attached Files Attached Files

  10. #70
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    During testing I noticed, that there is a bug in the latest alpha. With the file from http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.17.tar.bz2 I did the following, using the included Linux binary:
    Code:
    $ bunzip2 linux-2.6.23.17.tar.bz2
    $ ./tor | grep Tornado
    Tornado compressor v0.5 alpha (c) Bulat.Ziganshin@gmail.com  2009-01-14
    $ ./tor -4 -q linux-2.6.23.17.tar
    $ ./tor -d -q <linux-2.6.23.17.tar.tor >linux.tar
    $ cmp linux-2.6.23.17.tar linux.tar
    cmp: EOF on linux-2.6.23.17.tar
    Again, Tornado decompresses too much data. With another test file I also got this with method -5 and -6.

  11. #71
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    i've downloaded tcup 4.6, installed it (with all language files), tared using 7z and then tried tor:

    -3: compressed 323.086 -> 153.369 mb (47.5%), time 4.931 secs, speed 65.523 mb/sec
    -3: compressed 323.086 -> 155.906 mb (48.3%), time 4.540 secs, speed 71.171 mb/sec

    as you see, it's less than 10% difference. may be, it's cpu-specific issue.

    >-4 is always 2-3 times slower than -3

    -4 is much slower because it uses much more memory and don't fit into the cache. you should compare -3 -t0 and -3 -t1

  12. #72
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    Quote Originally Posted by jo.henke View Post
    I did some cleanup by completely moving the endian handling to Common.h - there, I added PowerPC specific instructions, which directly load/store little endian data. On PowerPC, this gives some speedup and a smaller binary.
    fixed, than you

  13. #73
    Member
    Join Date
    Jun 2008
    Location
    L?vis, Canada
    Posts
    30
    Thanks
    0
    Thanked 0 Times in 0 Posts
    No. 7zip has its own problems. There is no way to distinguish a LZMA file from a file with random numbers. The command line executable is between half a MB and a MB depending on the compilation options. This makes it unsuitable for devices with little memory or to put on a recovery floppy. Extracting files can require up to 800 MB of RAM. New users can't be expected to know that the command is 7za. The Linux port is quick and dirty. Because there are many lines of code, new releases introduce bugs.

    RAR is shareware so it cannot be used in OpenOffice or for a distribution package format, otherwise volunteer packagers would have to purchase a RAR license.

    FreeArc compilation failed with,
    In file included from URL.cpp:1:
    URL.h:23:23: curl/curl.h: No such file or directory

    Patrick Volkerding of Slackware has written that Slackware would not use LZMA so as to install quickly on 486s. If there was an archiver that has the same complexity as gzip, all distributions could have a common format. Not that they would, but they could.

  14. #74
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    creating general purpose archiver is big work. instead of looking someone crazy enough to create one more archiver just to fill needs of small linux systems, you can better look into compiling "minimal version" of p7zip. i think, it should be ~200kb large

  15. #75
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    updated http://www.haskell.org/bz/tornado.zip
    • fixed bug mentioned by jo.henke
    • applied his patches
    • -4/-5: hash reduced down to 2mb which means 20-30% faster work on cpus with 4+ mb cache

  16. #76
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    Bulat, thanks for the update!

    Regarding post #50
    Quote Originally Posted by Bulat Ziganshin View Post
    -3 mode compression improved by another 0.5%
    ...but on the file "linux-2.6.23.17.tar" I mentioned above, compression is worse.
    old: 62823254 bytes
    new: 62691790 bytes

    Btw, we better should replace "-otor" by "-lm -otor" in build.sh to prevent linking problems on non-IA32 (undefined reference to logb).

  17. #77
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    409
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by jo.henke View Post
    old: 62823254 bytes
    new: 62691790 bytes
    If I'm not absolutly wrong 62823254 (old) > 62691790 (new) or did you
    mix the numbers up?

  18. #78
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    yes, on text files result is a bit worse, on binaries 0.5% better, on compressed it's ~1% worse too. i'm thinking about dynamic hiffman blocksize tuning..

    i've put the following line in build.sh:

    PARM='-lm -otor main.cpp'

    and removed of course -otor main.cpp from other lines

  19. #79
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    Quote Originally Posted by Simon Berger View Post
    62823254 (old) > 62691790 (new)
    Unbelievable Thanks for this hint. Seems that I need some sleep...

    Sorry, Bulat! I'm very fine with the current version

  20. #80
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Mihai Cartoaje View Post
    No. 7zip has its own problems. There is no way to distinguish a LZMA file from a file with random numbers.
    Add a header. btw, I think there's two incompatible .lzma file formats, the only difference is header presence.
    Quote Originally Posted by Mihai Cartoaje View Post
    The command line executable is between half a MB and a MB depending on the compilation options.
    The bare LZMA decoder is 2-3K of executable code, for evidence look at UPX.
    Quote Originally Posted by Mihai Cartoaje View Post
    ... Extracting files can require up to 800 MB of RAM.
    Use appropriate dictionary size.
    Quote Originally Posted by Mihai Cartoaje View Post
    New users can't be expected to know that the command is 7za. The Linux port is quick and dirty. Because there are many lines of code, new releases introduce bugs.
    Alas... But you can fix that.
    Quote Originally Posted by Mihai Cartoaje View Post

    ...

    FreeArc compilation failed with,
    In file included from URL.cpp:1:
    URL.h:23:23: curl/curl.h: No such file or directory
    curl is expected to be standard library on *nix platforms.
    Quote Originally Posted by Mihai Cartoaje View Post

    Patrick Volkerding of Slackware has written that Slackware would not use LZMA so as to install quickly on 486s. If there was an archiver that has the same complexity as gzip, all distributions could have a common format. Not that they would, but they could.
    On my computer, LZMA is only twice as slow.

  21. #81
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Yesterday I checked it with TCUP - indeed a very small difference. I wanted to send you XSOS, but tried to run on it again. Both do times ~18 sec. What's up? I was tired, so I decided to investigate it the following day.
    However later that evening I tried it again...both modes run 6-7 sec.

    Today I found myself doing a mistake in the command line.
    I typed -c3 instead of -3.
    Result? 18 sec.

    Now, when everything is correct (I hope so), best times are:
    6.468 t0
    7.234 t1

    I guess that you're right saying that -4 is twice slower in my tests due to lower cache efficiency.

    Sorry for confusion.

    BTW your optimizations worked very well. -c3 (-5) is 18 and was 21 seconds (I checked it today).

  22. #82
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    >Today I found myself doing a mistake in the command line.

    just posting screen print like this may solve easily

    D:\testing>C:\!\FreeArchiver\Tests\Arc.exe create a dll700.dll -m5 -di+#
    Compressed 1 file, 690.514.620 => 176.460.614 bytes. Ratio 25.5%
    Compression time: cpu 455.91 secs, real 292.42 secs. Speed 2.361 kB/s

  23. #83
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    >Today I found myself doing a mistake in the command line.

    just posting screen print like this may solve easily

    D:\testing>C:\!\FreeArchiver\Tests\Arc.exe create a dll700.dll -m5 -di+#
    Compressed 1 file, 690.514.620 => 176.460.614 bytes. Ratio 25.5%
    Compression time: cpu 455.91 secs, real 292.42 secs. Speed 2.361 kB/s
    But how to produce such screenshot?

  24. #84
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    Quote Originally Posted by m^2 View Post
    But how to produce such screenshot?
    1. tor tor.exe 2>log

    2. right-click on console window [title], select Edit->Mark, select region with mouse and press Enter

  25. #85
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    1. tor tor.exe 2>log

    2. right-click on console window [title], select Edit->Mark, select region with mouse and press Enter
    Thanks.

  26. #86
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    895
    Thanks
    54
    Thanked 109 Times in 86 Posts
    I still got some decompression issues with rep.

    resulting file = 0 bytes
    original file size is around 4 gb


    --- edit ---

    some more info

    Original file = 3,86 GB (4.151.676.928 bytes)
    Rep compressed size = 2,56 GB (2.755.241.284 bytes)
    Rep decompressed size = o bytes

    Im' using -b1024 command line

    and this file especial benefits of big dictionary size
    even going from 192mb to 384mb Dict size in 7-zip
    Last edited by SvenBent; 18th January 2009 at 19:10.

  27. #87
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    just tested with two large files and 1gb buffer - nothing catached. can you try to find buggy part of srcfile and send it to me (via torrent, for example)?

    btw, if you need it, i have attached two utilities that simplifies extarcting chunks from files
    Attached Files Attached Files

  28. #88
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    895
    Thanks
    54
    Thanked 109 Times in 86 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    . can you try to find buggy part of srcfile and send it to me
    I'm on it
    just retestedd still decompression errors.
    hower there is no problem when using -b128 for compression.

    i will try to chop it the file down

    Funny part:
    it also made me discover a bug in 7-zip
    Using 1024mb dictionarys show to less estimated memory usage

    Side question:
    Is rep only 32bit (eg can it only utilize 2gb of memory)
    i wish to use rep with something like -b4048

  29. #89
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    rep is 32 bit now. i will look into making it 64-bit (not just now)

  30. #90
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    895
    Thanks
    54
    Thanked 109 Times in 86 Posts
    Ive tried cutting the file down but it seams to be a pretty big part.

    got it down to 1gb file size with the error by using knife start "1GB"

    knife start "512m" = 521mb part with no errors
    knife 512mb 512mb = 512mb part with no errors
    knife 256mb 512mb = 512mb part with no errors

    knife start "1gb-4byte" = Almost 1gb part with error
    knife start "0.9GB" = 0.9gb part with no errors

    OK so the problem part seems to end pretty close to 1GB mark of the file. yet starts before the 512mb mark.

    knife 256mb 768mb = 768mb part with no errors

    So the problem part seems to be bigger then 768mb

    do you want me to cut i down to the exact byte size of the problem part or is the first 1gb god enough ? the compressed size is 743mb.

    -- edit --
    Going to work now. maybe i'll cut the file even more later
    you can get the 1gb part here
    http://www.techcenter.dk/reperror.7z

    -- edit --
    i just looked back on the forum and it seems this is an old error as i have discoverede it before but testing new files gave no errors.
    Last edited by SvenBent; 21st January 2009 at 13:53.

Page 3 of 19 FirstFirst 1234513 ... LastLast

Similar Threads

  1. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51
  2. Wav Compression in Freearc, from TTA to TAK ?
    By eleria in forum Data Compression
    Replies: 10
    Last Post: 4th February 2010, 21:14
  3. REP and Delta fails with big files
    By SvenBent in forum Data Compression
    Replies: 14
    Last Post: 23rd November 2008, 19:41
  4. 4x4 bug?
    By m^2 in forum Data Compression
    Replies: 11
    Last Post: 15th November 2008, 19:25
  5. 4x4 - multithreaded compressor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 12
    Last Post: 19th April 2008, 17:25

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •