Results 1 to 22 of 22

Thread: NZ internals compared to Blizzard

  1. #1
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post

    NZ internals compared to Blizzard

    Yesterday Christian attempted to compare NZ internals to blizzard using permuted text ("to remove the effect of data filtering"). He found that NZ slows down 2-5x on permuted text and concluded that NZ has good filters. I tried to explain that what he found was how the data analysis heuristic work in NZ, not what the internals are and definitely not what the "filters" do. He flatly refused to believe this. So here is the real test using NZ internals compared to blizzard. I used an executable (MP=MegaProject) which was over 2 years old, it is slower and doesn't perform as well as NZ. It also had a fixed blocksize of 16MB, so I run blizzard in 16MB as well. (I also included a stronger variant of the MP coder.)

    Code:
    enwik:
    MP2006strong: 22790598  48.1 sec
    BLIZ_24c:     22869203  86.0 sec
    MP2006:       22969860  34.1 sec
    BLIZ_24f:     23191519  73.7 sec
    
    enwik8p (randomly permuted alphabet)
    MP2006strong: 23153869  48.6 sec
    BLIZ_24c:     23245550  87.5 sec
    MP2006:       23339698  34.0 sec
    BLIZ_24f:     23586139  74.8 sec
    In contrary to Christians findings, there is no slowdown at all. Also the "filters" in NZ are boring as I have repeatedly referred and not magical at all. The NZ performance relys on strong internals and fast data analysis heuristics (which often fail as Christian found out). Because the nz_* -compressors are large and complex, fast analysis heuristics are very important. It's trivial to improve the heuristics so that they would not fail on inputs such as permuted text, but then we get a small slowdown on common data. NZ heuristics are heavily tuned to real data and artificial data has been totally ignored. I believe this is correct and tuning for such data in mind will do no service for the overall performance of the compressor.

  2. #2
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Quote Originally Posted by Sami
    He found that NZ slows down 2-5x on permuted text and concluded that NZ has good filters.
    I'm sorry. I don't want to prolong this whole thing, but I have to correct this statement. I did not conclude, that the speed hit implied good filters. I said, "But the results strongly imply, that NanoZip uses some heavy text-preprocessing.". Actually, I was refering to different ratios of "-co".

    Quote Originally Posted by Sami
    I tried to explain that what he found was how the data analysis heuristic work in NZ, not what the internals are and definitely not what the "filters" do. He flatly refused to believe this.
    Well, I mainly refused the statements you read into my posts. I never disputed that the permuted alphabet misguided the data analysis and therefore the preprocessors. That was exactly the intention. I tried to explain this with the CCM example. And if you test CCM, UHARC or your NZ on permuted audio-, image- or text-data, you'll very often make the same observations - worse ratio. So, I'm sorry that I don't agree when you say "that's definitely not what the filters do".

    Quote Originally Posted by Sami
    NZ heuristics are heavily tuned to real data and artificial data has been totally ignored. I believe this is correct and tuning for such data in mind will do no service for the overall performance of the compressor.
    I agree. Btw., the same applies for CCM.

  3. #3
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Christian View Post
    I'm sorry. I don't want to prolong this whole thing, but I have to correct this statement.
    We know you don't like the facts, but here is the first test, which shows what you intended by saying "to remove the effect of data filtering".

    Well, I mainly refused the statements you read into my posts. I never disputed that the permuted alphabet misguided the data analysis and therefore the preprocessors. That was exactly the intention.
    And I have explained to you number of times that permuting text will not deduce the "filters".

  4. #4
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Is it time to stop this endless discussion?

  5. #5
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Quote Originally Posted by Sami View Post
    We know you don't like the facts, but here is the first test, which shows what you intended by saying "to remove the effect of data filtering".
    And we know, that you really like to make things up. I don't see what this test shows at all - e.g. Bliz has no data filter expect E8/E9, which doesn't apply to ENWIK anyway. So what do you show here?

    Quote Originally Posted by Sami View Post
    And I have explained to you number of times that permuting text will not deduce the "filters".
    Well, and I explained, that 'shuffling' can remove the effect of data-filters. Of course, I only know for sure, that CCM's detection fails by doing this, but worse ratio can be observed for UHARC, FreeArc and your NZ, too.

    If you say, that a substantial loss in ratio on e.g. wave-files after alphabet permutation is not due to the filters, then I have to believe you. As you have to believe me, that for CCM, FreeArc and others this is true.

  6. #6
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    And one more test to demonstrate what I mean. It's a song from Jack Johnson, 44KHz 16Bit Mono. It should be a pretty common song - or at least not a worst case file, right?

    Code:
    Size		  File
    ------------------------------------------------
    21.911.130    jack johnsons - let it be snug.pm (alphabet permuted)
    21.911.130    jack johnsons - let it be snug.wav
    19.779.794    jj.pm.uha (alz-2)
    19.533.664    jj.7z     (normal)
    19.195.469    jj.pm.7z  (normal)
    17.750.713    jj.pm.nz  (NZ -co)
    17.390.607    jj.pm.ccm (default)
    14.346.742    jj.uha    (alz-2)
    14.208.188    jj.ccm    (default)
    11.694.369    jj.nz     (NZ -co)
    As one can easily observe, the permuted file's ratio is dramatically worse for all media compressors. NZ loses ~6M, UHARC ~5.4M, CCM ~3.2M and 7-zip only ~0.3M. I think it's easy to guess which compressors support special wave filters. I think everyone else can make up his mind, too.
    Btw., 7-zip doesn't care much as it does not support media filters - the loss might be due to 7-zip's symbol coder which uses binary decomposition.
    Finally, I hope this puts this whole discussion to an graceful end.

  7. #7
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Christian View Post
    And we know, that you really like to make things up.
    This was a new accusation. There is a reason you didn't say what it is that I made up, because facts don't matter for you.

    I don't see what this test shows at all - e.g. Bliz has no data filter expect E8/E9, which doesn't apply to ENWIK anyway. So what do you show here?
    The test shows what you tried to find out. "I just wanted to see what's really inside" (of NZ). So the test shows your test didn't measure that, and there is no slowdown in NZ using permuted alphabets, and that your previous test didn't measure the effect of filtering as you stated.

    Well, and I explained, that 'shuffling' can remove the effect of data-filters.
    Of course, I only know for sure, that CCM's detection fails by doing this, but worse ratio can be observed for UHARC, FreeArc and your NZ, too.
    Yes and you just flatly refuse to admit that permuting input will not reduce the effects of a filter.

  8. #8
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Christian View Post
    And one more test to demonstrate what I mean.

    ...

    I think it's easy to guess which compressors support special wave filters.
    Well that's wrong again, NZ doesn't use any audio filter at all, but a wholly separate audio compressor as I have explained earlier.

  9. #9
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Quote Originally Posted by Sami View Post
    This was a new accusation.
    Please have a look at your second post. There was something about me not liking facts. So, you shouldn't throw rocks when you sit in a glasshouse.

    Quote Originally Posted by Sami View Post
    There is a reason you didn't say what it is that I made up, because facts don't matter for you.
    E.g. your statement, that I tried to correct in my first post. Or those strange theories you put in my mouth, like "that ANY compressor which differs from a simple statistical model with entropy coder, when being "analyzed", MUST BE first modified to find the "REAL" internals.", ...

    Quote Originally Posted by Sami View Post
    Yes and you just flatly refuse to admit that permuting input will not reduce the effects of a filter.
    See my previous post. Effects can be observed in serval MB of loss.

    Quote Originally Posted by Sami View Post
    Well that's wrong again, NZ doesn't use any audio filter at all, but a wholly separate audio compressor as I have explained earlier.
    No, it's not wrong. It shows, that the test worked. We found out that NZ treats audio in a special way. If it's a filter or a seperate compressor is just nitpicking.


    Finally, I won't reply anymore. It's like talking to a wall.

  10. #10
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Christian View Post
    No, it's not wrong. It shows, that the test worked. We found out that NZ treats audio in a special way.
    So now you changed the position to "a special way". As said, you are comparing two compressors in your test, audio and optimum, which doesn't include an issue with filtering at all.

    Finally, I won't reply anymore. It's like talking to a wall.
    Let me just summarize briefly the findings. You tried to measure the difference of NZ and NZ+filters. I have showed this cannot be done with permuting text/audio. Now you find that because of this, you feel like talking to a wall. No wonder you feel that way.

  11. #11
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    The biggest problem is both cris and sami do not open the source.

  12. #12
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Sorry for bothering. I have no knowledge in programming.

    But clearly Sami is taking christians replies out of context.

    Whatever its a filter or not. its still valid that permuted data is handlede differently then non permuted data. Therefor there is some kind of data recognition that "fails" and changes they way the data is compressed.

    just because Christian did not write his replies with precautions against any kind of misunderstanding his words doesn't not mean he is wrong.

    The "spirit" of his word is still the same. NZ does not handled permuted data the same way as non permuted data.


    Arguing by trying to deliberately misunderstand someone words is not the way to have a discussion.
    And is the first sign the people are just trying to be right rather than to find the truth.


    Sami if you are going to have any kind "reader " support you need to go after the ball and no the man.

    ---
    sidenote I'm really impressed by nanozip and its filters/special codes even beating out ccm and winrar on with much faster decompression time than rzm and ccm
    Last edited by SvenBent; 6th July 2008 at 20:24.

  13. #13
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by SvenBent View Post
    Sorry for bothering. I have no knowledge in programming.

    But clearly Sami is taking christians replies out of context.
    Show me where.

    Whatever its a filter or not. its still valid that permuted data is handlede differently then non permuted data.
    Why would there be disagreement in that? The disagreement is how to interpret the results.

    Therefor there is some kind of data recognition that "fails" and changes they way the data is compressed.
    This is the core issue, as I've been explaining. That permuted data finds a worst case in NZ analysis heuristics.

    Arguing by trying to deliberately misunderstand someone words is not the way to have a discussion.
    Show me where. Now we have only your word that I'm deliberately misunderstanding someone.

    Sami if you are going to have any kind "reader " support you need to go after the ball and no the man.
    The record shows Chris was the one appealing to emotions and cheap jabs. I was interested in the issue.

  14. #14
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Use private messages, nobody is interested to read this. Only result there is 15kB of "calmness". Christian apologized multiple times and agreed that results were somewhat inapplicable to real life, yet the discussion is continuing with quibbling in "but he said" way... why? The only reason this topic is now compression-oriented is that it can be compressed to 9 bytes. Pointless.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  15. #15
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Black_Fox View Post
    Use private messages, nobody is interested to read this. Only result there is 15kB of "calmness". Christian apologized multiple times and agreed that results were somewhat inapplicable to real life, yet the discussion is continuing with quibbling in "but he said" way... why?
    Why would Chris need to apologize? It think you are confused. It appears only Chris and I know what we are arguing about.

  16. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    As far as I understand, its like this:

    1. We can use an universal model tuned to some specific data
    (eg. by processing the data with delta filter for better
    audio compression).

    2. And we can detect the audio data using some heuristic,
    and apply different compressors depending on that
    (although this includes switching the delta filter on/off too).

    The difference is that with [1] we can test the performance
    of the same algorithm in different conditions, and with [2]
    we can't do that because a completely different compressor
    could be used if permuted audio wasn't detected as audio.

    Now, it seems that Sami tries to prove that his archiver
    is designed like [2] and not [1], while Christian is
    saying that nanozip has explicit support for some data types.
    So both are right, and there's no reason to spam so much.

    Instead I'd like to see some technical discussion,
    eg. about a design of archive format with a blockwise(?)
    support for different compressors, or the best way
    of data type detection and segmentation.

  17. #17
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Shelwien View Post
    As far as I understand, its like this:

    Now, it seems that Sami tries to prove that his archiver
    is designed like [2] and not [1], while Christian is
    saying that nanozip has explicit support for some data types.
    So both are right, and there's no reason to spam so much.
    For me, that's accurate, with the addition that I've also said that NZ has such explicit support.

    Instead I'd like to see some technical discussion,
    eg. about a design of archive format with a blockwise(?)
    support for different compressors, or the best way
    of data type detection and segmentation.
    I've struggled with this a lot, and at least I have been unable to find anything worthwhile to mention. NZ has three different internal drivers for different compressors, compromising the balance of memory, speed and functionality. Probably those are unnecessary, but I just wanted to implement everything as (in reasonable terms) optimal as possible, thinking from low level upwards, so that the driver is optimally designed for the components and not the other way around. So at least I had to know exactly what the compressor is and what it does/needs before being able to a answer the questions you made. But the problem is me probably, as NZ is kind of bloated in that sense.

  18. #18
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Now we got it settled.

    go back an improve nanozip
    people here are waiting for the next alpha release...

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > > eg. about a design of archive format with a blockwise(?)
    > > support for different compressors, or the best way
    > > of data type detection and segmentation.
    >
    > I've struggled with this a lot, and at least I have been
    > unable to find anything worthwhile to mention.

    Well, to be more specific
    1. Is NZ able to use different compression methods per
    block (like rar), or per file only? And can it keep
    all the models up to date in parallel, if its blockwise,
    or has to reset the model at each compressor switch?
    2. Is the detection statistical, or does it use
    format signatures and the like?
    3. Do you use statistical data segmentation (like seg_file)
    or fixed blocks only?
    4. Do you have a long match submodel/preprocessor (like BZ's rep)?

  20. #20
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Shelwien View Post
    Well, to be more specific
    1. Is NZ able to use different compression methods per
    block (like rar), or per file only? And can it keep
    all the models up to date in parallel, if its blockwise,
    or has to reset the model at each compressor switch?
    Everything in general is blockwise and whenever possible, all the compressors are kept up to date. Some heuristic rules are used to figure out when to skip updates etc.

    2. Is the detection statistical, or does it use
    format signatures and the like?
    Both. Format signatures are important for speed.

    3. Do you use statistical data segmentation (like seg_file)
    or fixed blocks only?
    The slower modes use segmentation, which I hope I can improve, because currently it's slow and not very good, probably Dmitry's seg_file is much better.

    4. Do you have a long match submodel/preprocessor (like BZ's rep)?
    The LZT (the binary compressor for opt1-2) has additional long match finder. The results is like rep integrated in lzma (presuming I understand what rep is doing), so that we do not need a separate pass. I've not had time to test it much though, or to compare is it worth having integrated or not.

  21. #21
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    I have some files where optim1 (-co) gives slightly smaller files,compress and decompress faster than optim2 (-cO).

    anything you would like to look into ?

  22. #22
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by SvenBent View Post
    I have some files where optim1 (-co) gives slightly smaller files,compress and decompress faster than optim2 (-cO).

    anything you would like to look into ?
    Thanks, but no need. I know when this happens and it's quite often. The opt1/2 modes are the most incomplete part of NZ and they will improve.

Similar Threads

  1. CCM, RZM , Slug and Blizzard ?
    By Nania Francesco in forum Data Compression
    Replies: 1
    Last Post: 14th August 2009, 00:57
  2. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 14:48

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •