Page 1 of 10 123 ... LastLast
Results 1 to 30 of 281

Thread: NanoZip - a new archiver, using bwt, lz, cm, etc...

  1. #1
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post

    NanoZip - a new archiver, using bwt, lz, cm, etc...

    Hi all,

    Nice forum you have got here. I have not seen such lively discussion on lossless compression elsewhere in public.

    Here is some brief info about NanoZip. The project is years old and still unfinished, but I have now released an alpha version of it anyway, it is available at www.nanozip.net. Some great compressors by other people in this forum were released as I was working on NZ, such as CCM, FreeArc, LPAQ, etc.

    NZ was primarily a BWT project to try out a new 5N blocksort algorithm. In addition I made bunch of other compressors to try out various ideas. These are all selectable in NanoZip. Note that the default compressor (nz_optimum1) is not finished, so please do not pay too much attention to it's performance. The idea of nz_optimum1-2 is to compress text using BWT and binary data using LZT (LZ based algorithm).

    I would ask if you could post benchmarks, so that we could see how NZ performs on different hardware. Use some other compressor for timing reference, so that we can intepret the results. Also there is a special case which interests me. I would appreciate if you could run the following setup for enwik8:

    nz a -co -txt -m80m test enwik8

    and also run test with some other BWT compressor with 16 mb block, eg. blizzard, dark, bcm, etc, so I could see is the performance same on other hardware than what is available to me.

  2. The Following User Says Thank You to Sami For This Useful Post:

    rYL (30th September 2014)

  3. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Wow, very cool!

    Note, very soon (depending on SourceForge.net staff only) I will release my new Open-Source BALZ file compressor. If you want, you may freely use it's compression engine in your great archiver!

  4. #3
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Looks great. But...
    1 - GUI has lot of bugs. For example, directories are no listed top off the files. Also, it's really hard to find a file in directory which have a lot of files. Because, shortcuts aren't implemented. Also, unicode is not supported. So, in my language (Turkish) some file names look weird. Shell API greatly handles of this issues. I advice to look at it if you have enough time.
    2 - Console version has a small bug about CPU detection. I have a CPU with two cores (Core2 Duo). But, detected as one core. Do you use GetSystemInfo()? AFAIK, it provides this info easily.
    3 - In GUI, I have selected Standard memory usage (256 MB). But, nz_cm codec usage was 88 MB (I looked at the dialog)

    On the other hand, in compression dialog comparasion the codec by graphical manner is really good idea. I hope, Bulat will do this for FreeArc. I really like it.

    I have tested it in GUI with nz_cm. It was good. It took down SFC files into around 9 MB with around 500kb/sec if I'm not wrong. I will test it deeply later.

  5. #4
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    another bug for CPU detection code:
    Code:
    CPU "GenuineIntel" family 6, model 15, stepping 10
    CPU-features: MMX SSE1 SSE2 SSE3 SSSE3
    Cache: L1 0 KB, L2 4096 KB, L3 0 KB
           L1 line size 0, 0 ways. L2 line size 64, 8 ways
    Note that I have Core2 Duo 2.2GHz. IIRC, getting CPU specifications on modern CPU is easier than the others (I have some experiments in this area). Old CPU's require more different ways to get cache specifications.

    (I know it's alpha version. I'm just reporting to let you know something)

  6. #5
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thanks osmanturan. You can sort the files by name, which leaves the directories at the top. I'm aware that the memory settings are far from being precise. I planned UNICODE support, but its not implemented. I'm detecting the cpu with custom assembly code, I will take a look at it. Currently only one cpu is used anyway. I should have included a list of known things missing and not functioning properly. I will do this for the next version. I hope this version is useful for testing anyway, there should not be any serious bugs.

  7. #6
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Well, I just checked Nanozip with a few settings, but its performance seems to be simply amazing! Great job! Can't wait until the final release.

    Btw., what kind of data filters do the different methods use?

  8. #7
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Hi Christian,

    Thanks for the praise. Please let me see your benchmarks, if you have the records. Considering the history of this project, "a final version" will probably take its time. As to filters, depending on compressor, they might be integrated (as delta with lzhds) or applied in a separate pass. NZ has separate compressors for audio and images, different variants for most of the compressors, etc.

  9. #8
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    As I said before Shell API greatly resolves your problems on file list (unicode, icons, shortcuts etc.). Let me clear on "hard to finding a file". I have a habit for selecting a file in folder which have lots of file. I just type one more characters of the filename which I want to get. This is not run on your compressor.

    I had used assembly to detect the CPU, too. But, you have to run your code per core by setting affinity mask with a kernel32 call. Also, I said before, easiest way for getting "true" the number of CPU cores is calling the GetSystemInfo. After this, you should get the affinity mask of the current process. By inspecting bits in the mask, you are able to detect whether you are in multicore or hyperthreading. After that, you should run your assembly detecting code on each true/virtual core by setting affinity mask. Note that, cache detecting is kind of troublesome for all CPUs. Because, old CPUs have really different ways for getting this info.

    A quick test on enwik8
    Code:
    Blizzard 0.24b -> 20,867,854 bytes (48.640 seconds)
    NanoZip 0.0 alpha -> 22,242,211 bytes (17.383 seconds)
    BCM 0.2a -> 23,761,415 bytes (115.812 seconds)
    Output of the compressor:
    Code:
    NanoZip 0.00 alpha - Copyright (C) 2008 Sami Runsas - www.nanozip.net
    CPU: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz, cores # 1, memory: 968/204
    6 MB
     *** THIS IS AN EARLY ALPHA VERSION *** USE ONLY FOR TESTING ***
    Archive file: test.nz
    Compressor: nz_optimum1, using 101 MB memory.
    Compressed 100 000 000 into 22 242 211 in 16.33s, 5 978 KB/s
    IO-in: 0.10s, 875 MB/s. IO-out: 0.65s, 32 MB/s
    Test machine: Core2 Duo 2.2GHz, 2GB RAM, Vista Business x64 SP1
    Acutimer 1.2 was used for measuring timing statistics.

  10. #9
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    did i miss the link for it ?

    -- edit ---

    seems i did.. i found it now
    Last edited by SvenBent; 5th July 2008 at 22:08.

  11. #10
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    osmanturan, thanks again. I may remove the entire cpu detection, since that's probably the easy way of solving the problem. NZ will be multiplatform, so that's why I have not explored the shell apis and file icon stuff. I try making the fileview a bit clearer.

    But your bliz results look weird, I get 22869203 bytes using "bliz c enwik8 test 16777216". If you used larger block, try having one with NZ as well, for enwik8: "nz a -co -txt -m500m test enwik8" uses 100mb block.

  12. #11
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sami
    osmanturan, thanks again. I may remove the entire cpu detection, since that's probably the easy way of solving the problem.
    You may do it. Of course this is the easiest solution

    Quote Originally Posted by Sami
    NZ will be multiplatform, so that's why I have not explored the shell apis and file icon stuff. I try making the fileview a bit clearer.
    I think, you should look at it. It's not so hard. Good file list can be totally shell api driven. Believe me, you will spend less effort on shell api.

    Quote Originally Posted by Sami
    But your bliz results look weird, I get 22869203 bytes using "bliz c enwik8 test 16777216". If you used larger block, try having one with NZ as well, for enwik8: "nz a -co -txt -m500m test enwik8" uses 100mb block.
    Aaah...Sorry for that. I have used 100,000,000 bytes block on blizzard (same as yours). bcm used 8 MB blocks.

  13. #12
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Could you give us more technical information about your codecs? I especially keen on nz_cm.

  14. #13
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    I've estimated in the past that the cm codec is about 5% of the effort put into the project. First the cm was much faster, but then CCM came and I found comparison depressing. I tuned it to slower speed, and later tuned it to roughly to lpaq speed. The LZT in optimum2 mode is 5x faster decompressing than compressing, and it's often close to the cm in ratio, so this makes the cm mode obsolete. Also I don't think that the novelties in the cm codec are significant.

    --edit--

    Ok, so that was not technical. Currently the nz_cm is paqish. An analyzer runs in parallel to the compressor which attempts to pick the best kind of models for the current data.
    Last edited by Sami; 5th July 2008 at 22:42.

  15. #14
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Sami!

  16. #15
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    MONSTER OF COMPRESSION!!!!!!!!

    Yes very very very very very very Nice!
    Incredible Compression!
    Tested on MOC !

  17. #16
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Here is deeply testing:

    nz_lzpf
    Code:
    a10.jpg -> 842,549 bytes
    acrord32.exe -> 1,853,613 bytes
    english.dic -> 1,032,514 bytes
    flashmx.pdf -> 3,960,165 bytes
    fp.log -> 1,714,093 bytes
    mso97.dll -> 2,409,902 bytes
    ohs.doc -> 1,003,272 bytes
    rafale.bmp -> 1,762,851 bytes
    vcfiu.hlp -> 1,081,615 bytes
    world95.txt -> 1,384,307 bytes
    
    Total Size: 17,044,881 bytes
    Total Time: 2.380 seconds
    nz_lzpf_large
    Code:
    a10.jpg -> 842,549 bytes
    acrord32.exe -> 1,825,005 bytes
    english.dic -> 991,869 bytes
    flashmx.pdf -> 3,901,187 bytes
    fp.log -> 1,232,416 bytes
    mso97.dll -> 2,329,504 bytes
    ohs.doc -> 939,784 bytes
    rafale.bmp -> 1,697,975 bytes
    vcfiu.hlp -> 927,361 bytes
    world95.txt -> 1,020,185 bytes
    
    Total Size: 15,707,835 bytes
    Total Time: 3.436 seconds
    nz_lzhd
    Code:
    a10.jpg -> 841,800 bytes
    acrord32.exe -> 1,646,306 bytes
    english.dic -> 754,031 bytes
    flashmx.pdf -> 3,804,786 bytes
    fp.log -> 1,324,117 bytes
    mso97.dll -> 2,143,630 bytes
    ohs.doc -> 870,980 bytes
    rafale.bmp -> 1,627,561 bytes
    vcfiu.hlp -> 868,361 bytes
    world95.txt -> 719,021 bytes
    
    Total Size: 14,600,593 bytes
    Total Time: 5.464 seconds
    nz_lzhds
    Code:
    a10.jpg -> 841,800 bytes
    acrord32.exe -> 1,440,387 bytes
    english.dic -> 662,557 bytes
    flashmx.pdf -> 3,755,994 bytes
    fp.log -> 1,089,646 bytes
    mso97.dll -> 1,926,938 bytes
    ohs.doc -> 826,068 bytes
    rafale.bmp -> 1,465,643 bytes
    vcfiu.hlp -> 749,649 bytes
    world95.txt -> 640,462 bytes
    
    Total Size: 13,399,144 bytes
    Total Time: 12.112 seconds
    nz_optimum1
    Code:
    a10.jpg -> 832,148 bytes
    acrord32.exe -> 1,285,945 bytes
    english.dic -> 470,226 bytes
    flashmx.pdf -> 3,759,417 bytes
    fp.log -> 502,926 bytes
    mso97.dll -> 1,689,490 bytes
    ohs.doc -> 760,316 bytes
    rafale.bmp -> 787,079 bytes
    vcfiu.hlp -> 637,699 bytes
    world95.txt -> 433,106 bytes
    
    Total Size: 11,158,352 bytes
    Total Time: 25.626 seconds
    nz_optimum2
    Code:
    a10.jpg -> 829,965 bytes
    acrord32.exe -> 1,161,593 bytes
    english.dic -> 468,049 bytes
    flashmx.pdf -> 3,721,322 bytes
    fp.log -> 498,604 bytes
    mso97.dll -> 1,552,742 bytes
    ohs.doc -> 741,538 bytes
    rafale.bmp -> 913,844 bytes
    vcfiu.hlp -> 573,390 bytes
    world95.txt -> 428,056 bytes
    
    Total Size: 10,889,103 bytes
    Total Time: 73.728 seconds
    nz_cm
    Code:
    a10.jpg -> 813,086 bytes
    acrord32.exe -> 1,039,199 bytes
    english.dic -> 443,708 bytes
    flashmx.pdf -> 3,642,270 bytes
    fp.log -> 538,163 bytes
    mso97.dll -> 1,418,052 bytes
    ohs.doc -> 698,268 bytes
    rafale.bmp -> 1,076,163 bytes
    vcfiu.hlp -> 469,736 bytes
    world95.txt -> 395,025 bytes
    
    Total Size: 10,533,670 bytes
    Total Time: 124.844 seconds
    Pufff...It was really boring to test all of the codec Do you plan to reduce codec count?

    Test platform: Core2 Duo 2.2GHz, 2GB RAM, Vista Business x64 SP1

    nz_optimum1 and nz_optimum2 codecs display as 1287 MB memory usage.
    nz_cm displays as 792 MB memory usage.
    The other codecs display 0 MB memory usage.
    (I didn't look at task manager for real usage)

    Edit: Sorry, I forget to give the details about memory usage. I set -m1g in all codecs (documented as to use 1gb memory)
    Last edited by osmanturan; 5th July 2008 at 23:58.

  18. #17
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi!

    I tried the CM part, too. It behaves *very* much like LPAQ. I tried using "nz a -cc -m120m -v". Could you explain the CM codec a bit more in detail?

    Code:
    05.07.2008  22:14           813.072 A10.jpg.nz_120m.nz
    05.07.2008  22:15         1.041.982 AcroRd32.exe.nz_120m.nz
    05.07.2008  22:15           450.179 english.dic.nz_120m.nz
    05.07.2008  22:15         3.643.535 FlashMX.pdf.nz_120m.nz
    05.07.2008  22:17           538.213 FP.LOG.nz_120m.nz
    05.07.2008  22:17         1.423.723 MSO97.DLL.nz_120m.nz
    05.07.2008  22:17           698.674 ohs.doc.nz_120m.nz
    05.07.2008  22:18         1.076.731 rafale.bmp.nz_120m.nz
    05.07.2008  22:18           472.074 vcfiu.hlp.nz_120m.nz
    05.07.2008  22:18           396.694 world95.txt.nz_120m.nz
                  10 Datei(en)     10.554.877 Bytes

    But alltogether this really is a bunch of codecs. What is "optimium 2", btw?

  19. #18
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Enwik8 results

    Nanozip 0.00a - 22.242.211 bytes - 13.577 Seconds
    Blizzard 24b 16mb - 22.869.203 byte - 35.819 Seconds

    Core 2 Quad Q9300 (OC 2.8GHZ/400FSB, Mem 1:1)

  20. #19
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Hi toffer,

    Quote Originally Posted by toffer View Post
    I tried the CM part, too. It behaves *very* much like LPAQ.
    As said I've tuned it lpaq comparison in mind, but it should be faster and compress better. Even though just lpaq implemented using the internal NZ library should do exactly that. However if you find that nz_cm is exactly (?) like lpaq, I would like to see what drew to you into this conclusion, if you can post links to the files in which the performance is close. I want to test them too.

    But alltogether this really is a bunch of codecs. What is "optimium 2", btw?
    It uses BWT (QLFC based) for text and LZT for the rest. LZT is almost like cm in compressing some data, but 5x faster when decompressing.

  21. #20
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    No, but it behaves very similar. This is my impression after a quick test. But i'd still like to know your coding pipeline.

    What do you mean here: "Even though just lpaq implemented using the internal NZ library should do exactly that."

  22. #21
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    So, I'm back with some testing results. But only for compression - otherwise it'd have taken too long. The tests were run on a C2D E6750 at 3.4GHz with 2GB system ram. The compression times are not very accurate (maybe +/- 10%).

    The first testset is ~600M and consists of ENWIK8, SFC, OO23, CALGARY, VALLEY_CMB, ABIWORD, GIMP and some more.

    Code:
    ALGO.   TIME        626.386.935
    -------------------------------
    F         7.1s      157.415.836
    Slug      4.6s      142.735.323
    D        53.2s      124.109.315
    RZM     552.8s       97.959.634
    O       367.7s       97.333.333
    CCM  5  195.7s       91.374.515
    CCMx 5  256.6s       89.099.842
    c       853.8s       84.994.940
    For the next test I used the same testset, but used a random permutation of the byte alphabet to remove the effect of data filtering.

    Code:
    ALGO.   TIME        626.386.935
    -------------------------------
    F         8.3s      158.520.595
    Slug      5.4s      143.963.934
    D        65.0s      128.411.422
    O       943.9s      103.499.764
    RZM     562.3s       99.465.678
    CCM  5  202.2s       96.007.261
    CCMx 5  267.7s       93.292.037
    c       950.6s       87.733.545
    And the third test, tar'ed SFC.

    Code:
    ALGO.   TIME         53.143.552
    -------------------------------
    F         0.7s       15.709.297
    Slug      0.5s       14.717.483
    D         3.4s       13.231.552
    RZM      35.0s       11.318.765
    O        43.9s       10.902.253
    CCM  5   16.4s       10.890.174
    CCMx 5   20.6s       10.794.961
    c        72.6s       10.286.415
    And finally, tar'ed SFC + permutated alphabet.

    Code:
    ALGO.   TIME         53.143.552
    -------------------------------
    F         0.6s       16.028.959
    Slug      0.5s       14.906.927
    D         3.7s       14.004.930
    O        72.7s       12.107.159
    RZM      36.0s       11.718.937
    CCM  5   17.2s       11.475.464
    CCMx 5   20.3s       11.380.333
    c        79.5s       10.819.570
    Notes:
    -------
    Phew, that was much. Well, first two test show great results. The mode "O" gets very slow on the permuted data, though.

    And then we have SFC. Here, NanoZip does excellent, too. It can be observed than Nanozip uses great data filters from mode "D" upwards. Additionally, it seems that mode "O" loses much speed on the permutated data (RZM seems to be stronger than "O" in both tests when filtering is removed). It might be accidentally, but NanoZip seems to be tweaked for SFC?

    The permutated data isn't fair, of course. I was just interested in the impact and the quality of your filters (SBC did have very good filters, too). But as I said before, NanoZip rocks and I can't wait for benchmarks and the final. If you feel like it, it'd be really interesting to know more about your filters. If not, then just ignore this.

  23. #22
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Nania:
    Quote Originally Posted by Nania Francesco
    Yes very very very very very very Nice!
    Thanks.

    Quote Originally Posted by osmanturan View Post
    Here is deeply testing:

    ...

    Pufff...It was really boring to test all of the codec Do you plan to reduce codec count?
    Thanks for testing, although we don't have a reference for timings. Also some codecs are only good if decompression time is considered. For example nz_lzhd/s codecs are really tuned for maximal decompression speed (considering the ratio range).

    nz_optimum1 and nz_optimum2 codecs display as 1287 MB memory usage.
    nz_cm displays as 792 MB memory usage.
    The other codecs display 0 MB memory usage.
    (I didn't look at task manager for real usage)

    Edit: Sorry, I forget to give the details about memory usage. I set -m1g in all codecs (documented as to use 1gb memory)
    Yeah, the memory usage was something I tried adding last minute, but I was just too tired to finish it. So please don't pay much attention to the memory weirdness.

  24. #23
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi Christian!

    The incredible in NanoZip is the compression ratio and decompression speed! Impressive!

  25. #24
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Christian View Post
    So, I'm back with some testing results. But only for compression - otherwise it'd have taken too long. The tests were run on a C2D E6750 at 3.4GHz with 2GB system ram. The compression times are not very accurate (maybe +/- 10%).

    The first testset is ~600M and consists of ENWIK8, SFC, OO23, CALGARY, VALLEY_CMB, ABIWORD, GIMP and some more.

    ...

    For the next test I used the same testset, but used a random permutation of the byte alphabet to remove the effect of data filtering.

    ...

    And finally, tar'ed SFC + permutated alphabet.

    Notes:
    -------
    Phew, that was much. Well, first two test show great results. The mode "O" gets very slow on the permuted data, though.
    Thanks for testing, although I'm not sure what you were attempting to measure. If text is permutated optimum uses LZT for text instead of BWT. LZT is not much better on text than lzma, because according to my findings, LZ77 like scheme is very difficult (at least to me) to make perform better on text without discarding matches for literals. Optimum1/2 counts in this fact, that no text gets LZT'ed, because the performance is terrible. I haven't yet made analyzer which would attempt detecting text-like-non-text for bwt. So your tests try to minimize all that I've been working for and what NZ is made for, that is practical compression efficiency.

    It might be accidentally, but NanoZip seems to be tweaked for SFC?
    I use real applications, like googleearth, games, etc, for testing.

    --edit-- google-earth, not maps...

    If you feel like it, it'd be really interesting to know more about your filters. If not, then just ignore this.
    There isn't any magical filters. As I said, delta for general purpose binary data and image and audio compressors (which go unused in MC corpus). For text I use a small dictionary. I was interested in how to make dictionary replacement very fast, and I didn't pay much attention to the ratios or how it should be done. My blocksort runs faster on redundant data, so it actually slows down when the dictionary replacement takes place.

    --edit--
    ...also I translate relative addresses to absolute ones with .exe files.
    Last edited by Sami; 6th July 2008 at 00:57.

  26. #25
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by toffer View Post
    What do you mean here: "Even though just lpaq implemented using the internal NZ library should do exactly that."
    I mean that if lpaq would be implemented using roughly the equalent components in the internal NZ library, I would expect it to compress better and be slightly faster. But this is just a guess.

  27. #26
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    So, is your coding scheme equal to LPAQ?

  28. #27
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by toffer View Post
    So, is your coding scheme equal to LPAQ?
    In sense that I have lpaq mixer. Also hashing and some higher order models have paq-like non-stationary state. These are essential things, so I think we can say the coding scheme is equal. Earlier versions of nz_cm used large table as a mixer, so that I took set of variables and used them as coordinates to lookup the table for model weights.

  29. #28
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    I just did another quick test with ENWIK8:

    Code:
    bliz f 100000000     27.6s    21.171.896
    bliz c 100000000     32.1s    20.867.854
    -co -txt -m500m      11.2s    20.503.645
    -cO -txt -m500m      14.2s    20.306.505
    And ENWIK8 permutated:

    Code:
    -cO -txt -m500m     124.4s    28.620.262
    bliz f 100000000     27.6s    21.443.828
    -co -txt -m500m      23.9s    21.269.385
    bliz c 100000000     32.9s    21.119.762
    Looking at the times, mode -cO does not use BWT on the permutated data. But the results strongly imply, that NanoZip uses some heavy text-preprocessing. But you've already posted this before. The testing and writing just took too long.
    Well, NanoZip is hands down better than Bliz on normal data. It's a bad excuse, but Blizzard was written on one day.

    Quote Originally Posted by Sami View Post
    So your tests try to minimize all that I've been working for and what NZ is made for, that is practical compression efficiency.
    I want to say sorry. Of course, I'm not trying to minimize your work. That's why I always included both tests (normal and permutated). I just wanted to see what's really inside. And now, I stop running tests on such data because it's stupid for real world tests. On normal data NanoZip performs excellent, as I've already written a couple of times.

  30. #29
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sami
    Thanks for testing, although we don't have a reference for timings. Also some codecs are only good if decompression time is considered. For example nz_lzhd/s codecs are really tuned for maximal decompression speed (considering the ratio range).
    Forgive me, but it was really boring for me. Would you mind if I test and post the result later? At least, some compression time with any other compressors? If you are impatient about this, you can find any other compressors timing at my laptop in the forum.

    Also, why am I not only the one person who thinks cm part is a kind of PAQ clone?

  31. #30
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Christian View Post
    I just did another quick test with ENWIK8:

    Code:
    bliz f 100000000     27.6s    21.171.896
    bliz c 100000000     32.1s    20.867.854
    -co -txt -m500m      11.2s    20.503.645
    -cO -txt -m500m      14.2s    20.306.505
    And ENWIK8 permutated:

    Code:
    -cO -txt -m500m     124.4s    28.620.262
    bliz f 100000000     27.6s    21.443.828
    -co -txt -m500m      23.9s    21.269.385
    bliz c 100000000     32.9s    21.119.762
    Looking at the times, mode -cO does not use BWT on the permutated data. But the results strongly imply, that NanoZip uses some heavy text-preprocessing. But you've already posted this before. The testing and writing just took too long.
    Well that is false again. You cannot "find" the internals by permuting the text and making meaningless tests. Optimum is more than bwt on text (otherwise I'd call it BWT/LZT), if the text is not detected in the first second, then it goes into different path and time is wasted, it's checked for audio, it's checked for image, etc. If we would only use NZ bwt, it wouldn't matter how you permutate enwik8, it's always roughly the same speed. As said, the filters only slow down the sorting. The "real" core of NZ, is what you see. The NZ bwt is as said, slightly faster than optimum1 with filters and bwt.

    I just wanted to see what's really inside. And now, I stop running tests on such data because it's stupid for real world tests.
    It's unfortunate that these arguments surface time again. As if we could take a compressor and figure out what is the core and what is not. Especially, if the comparison is done by generating artificial data, which supposedly brings the core up. The tester generates artificial data until he finds the worst case of the particular compressor, and then declares, now I found the "real" core. This theory is bizarre.

    --edit--- typos
    Last edited by Sami; 6th July 2008 at 01:24.

Page 1 of 10 123 ... LastLast

Similar Threads

  1. Nanozip decompression data troubles
    By SvenBent in forum Data Compression
    Replies: 11
    Last Post: 12th January 2009, 22:25
  2. BWT - how to use?
    By m^2 in forum Data Compression
    Replies: 29
    Last Post: 6th November 2008, 02:01
  3. NanoZip huge efficiency issue
    By m^2 in forum Data Compression
    Replies: 9
    Last Post: 10th September 2008, 21:51
  4. enwik9 benchmark nanozip, bliz, m99, dark
    By Sami in forum Data Compression
    Replies: 6
    Last Post: 31st July 2008, 20:24
  5. DARK - a new BWT-based command-line archiver
    By encode in forum Forum Archive
    Replies: 138
    Last Post: 23rd September 2006, 21:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •