Page 1 of 5 123 ... LastLast
Results 1 to 30 of 130

Thread: Audio Compression

  1. #1
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Audio Compression

    I can't understand why some people insist on delta transform as a filter or analog models as a submodel of a CM coder for compressing audio or bitmaps. I think, wavelet or related algorithms do fairly good job. For example:

    Fazil Say-Uc Selvi.wav (58.967.036 bytes): 16-Bits Stereo, directly ripped from original CD as wav format. This means that it doesn't have quantized samples when we compare mp3->wav conversion.

    ACWAV 22.358.653 bytes (~4 seconds)
    PAQ8o8 (-7) 25.741.080 bytes (6744 seconds)
    WinRAR (Best) 26.353.902 bytes (~15 seconds)
    CMM4 v0.1f (76) 35.081.247 bytes (115 seconds)
    BIT (LWCX) 35.471.999 bytes (115 seconds)
    7-Zip (Ultra) 37.452.086 bytes (~31 seconds)
    BALZ (ex) 40.356.654 bytes (143 seconds)

    ACWAV compresses a wav file with a poor arithmetic coding after S+P transform. S+P transform actually a haar wavelet with simple predictive coding. So, it can be applied to any analog data e.g. bitmaps.

    BIT LWCX mode is order 0-4, 6 CM coder based on neural network and SSE layer (512 MB memory was used for context hashes).

    PAQ 8o8 processing time was very boring.

    Also, note that on my laptop (Core2 Duo 2.2Ghz, 2GB RAM) file writing speed is around 20-25 MB/seconds. So, ACWAV's speed actually around 1 second!

    @encode:
    Why don't you do some speed optimization on your LZ based compressor? Because, as you see in the timings some CM coders outperform your compressor in both time and final size. You may talk about decompression speed. But, if we consider total time (compression+transmission+decompression), we can compare CM coder with your BALZ. I think, you should definitely do some speed optimization.

    Also, I noticed that your compressor only benefits on highly redundant file such as FP.log while my CM coder (BIT LWCX mode) benefits on semi-imcompressible files. Maybe this can be a pivot point for your optimizations.

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    • LZ with an optimized parsing can be slower than CM
    • Testing compressors, please post the version number - different versions perform differently
    • All possible optimizations are mostly done

  3. #3
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > LZ with an optimized parsing can be slower than CM
    What about 7-zip records?

    > Testing compressors, please post the version number - different versions
    > perform differently
    Versions:
    BALZ 1.12
    7-Zip 4.57
    WinRAR 3.71

    Here is another test. I'll post more test at this night.
    PIMPLE 1.43 Beta (Extreme)
    24.927.457 bytes (~240 seconds)

    > All possible optimizations are mostly done
    There is always enough room for further improvement

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    > LZ with an optimized parsing can be slower than CM
    What about 7-zip records?
    Each algorithm has a worst case behavior. For BALZ the worst case is incompressible data or badly compressible data like analog audio... For example, LZPM is OK in this situation, but it really poor at fp.log compression...

  5. #5
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I agree. But, most of files are already compressed today. Like JPEG, PDF, DOCX etc. Maybe you can add a special codec for a highly redundant files besides on your real codec.

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    BTW, new BALZ v1.13 will be much stable on already compressed/analog data due to a special model added!

  7. #7
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You are comparing apples to oranges... LZ based compressors try to find patterns, which hardly occur in analog data. The same holds for general purpose CM compressors, which normally use octet aligned order N models.

    This analog data is organized in 16 bit records which might be interleaved... So these general algorithms will hardly handle the high correlation of the significant bits of nearby samples and the cross correlation between the two stereo channels.

    The reason why people add these filters is, that they improve compression on data which doesn't follow these assumptions - while requiring no further algorithm modification. Just take PNG as an example - different predictors followed by deflate.

    A special model taking the data's nature into account will _always_ be superior to such "fixes".

    I think a good starting point would be LPC in conjunction with an error model based on contexts which are made of mean and variance of the nearby, transformed samples (actually error values).
    Last edited by toffer; 9th June 2008 at 15:25.

  8. #8
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > You are comparing apples to oranges... LZ based compressors try to find
    > patterns, which hardly occur in analog data. The same holds for general
    > purpose CM compressors, which normally use octet aligned order N models.
    You're right at some point. But, not all for me. Actually I'm not comparing LZ based compressor with CM on audio data. My note for only encode about his compressor. I've compressed more files with his compressor. But, I noticed his compressor only benefits on highly redundant data. AFAIK he implemented a simple CM back-end for ROLZ schema. So, I didn't except like that situation. That's all.

    I've added WinRAR. Because, it have an audio filter which based on delta transform.
    I've added 7-zip. Because, it has optimal parsing.
    I've added PAQ8o8. Because, it has an analog model.
    I've added CMM4. Because, it's well-tuned CM but not have analog model(AFAIK of course)
    I've added BIT LWCX mode. Because, it's an only context mixer. Not have submodels and doesn't tuned well.

    My main goal is to show that simple haar transform can beat delta transform+your favorite compressor. Because, it does a subband decomposition which improve compressing. Notice that, it is very fast. Another thing you can replace your delta transform with an haar transform. Because, it needs compression (actually it's output like BWT, it's only a transform).

    > A special model taking the data's nature into account will _always_ be
    > superior to such "fixes".
    I completely agree. That's the reason why I was open this thread.

    > I think a good starting point would be LPC in conjunction with an error
    > model based on contexts which are made of mean and variance of the
    > nearby, transformed samples (actually error values).
    Maybe yes. AFAIK, when we think on lossy speech compression, LPC's results are superior. It can be lossless by adding error terms. But, I'm not sure which one is better: wavelet related or LPC?

  9. #9
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > BTW, new BALZ v1.13 will be much stable on already compressed/analog
    > data due to a special model added!
    I'm waiting with crossed fingers

  10. #10
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    @all
    Which compressors should I test? I want to test especially the compressors which have audio and/or delta transform for showing subband decomposition benefits. Which well known does fit this situations?

  11. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    A haar transform is so fast, since it doesn't need any multiplications, only add/sub.

    My favorite would be a linear combination of nearby samples and different error models mixed together in a CM fashion - note that not only the quality of the decorrelation counts. The linear prediction can easily be tuned to reduce its RMS with online optimization.

    I would prefer this, since it can easily be extended to image data, you only need to change the sample's source.

    And another nice feature is, that you can include samples from another channel, so you can take the cross correlation of, e.g. different color planes or audio channels into account.
    I don't know how you want to do this by simply using a tf-transform, which relies on a single channel to be processed.

  12. #12
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    where to get acwav ?

  13. #13
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > where to get acwav ?
    http://www.cipr.rpi.edu/~said/FastAC.zip

    > A haar transform is so fast, since it doesn't need any multiplications,
    > only add/sub.
    That's why I love the haar transform

    > My favorite would be a linear combination of nearby samples and different
    > error models mixed together in a CM fashion - note that not only the
    > quality of the decorrelation counts. The linear prediction can easily be
    > tuned to reduce its RMS with online optimization.
    AFAIK, PAQ 8o8 already doing similar thing. So, you can see the compressed file size before your implementation

    > I would prefer this, since it can easily be extended to image data, you only
    > need to change the sample's source.
    This kind of transforms (eg. fourier, DCT, wavelet etc) can be easily extended to image even volume data. You only need perform transform for each dimension. eg. columns, then rows.

    > And another nice feature is, that you can include samples from another
    > channel, so you can take the cross correlation of, e.g. different color
    > planes or audio channels into account.
    For audio this can be an improvement. But not for image data. Because, different color planes doesn't have similar things at all.

    > I don't know how you want to do this by simply using a tf-transform, which
    > relies on a single channel to be processed.
    After transforming that we have low-pass and high-pass components. From my point of view, these outputs can be processed as 7-zip solution for x86 data. BCJ2 transform outputs several streams for a single x86 data. Each of them compressed seperately.

  14. #14
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    PAQ doesn't do what i described.

    For audio this can be an improvement. But not for image data. Because, different color planes doesn't have similar things at all.
    This isn't true. Have you ever looked at different color planes?
    An adaptive, linear mix can learn the direction of correlation (what happens to the prediction error for the R plan, if the G plane's error raises?).

    I don't know how you want to do this by simply using a tf-transform, which relies on a single channel to be processed.
    You misunderstood me - How do you want to take such inter-channel correlation into account using only a transform?
    Last edited by toffer; 9th June 2008 at 22:52.

  15. #15
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > This isn't true. Have you ever looked at different color planes?
    > An adaptive, linear mix can learn the direction of correlation (what happens
    > to the prediction error for the R plan, if the G plane's error raises?).
    Now, I see. I thought about of CMYK planes when I saw your post. I didn't take into account RGB planes. If we think about RGB as greyscale, your opinion becomes more clear. Thanks.

    > You misunderstood me - How do you want to take such inter-channel
    > correlation into account using only a transform?
    If I don't misunderstand you again, I don't want to add something which take care of channel relationships.

  16. #16
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ok, you can do it like that - but dropping a known correlation will reduce the prediction/compression performance.

  17. #17
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Aaa...I found a better way (!). Could you look at it and tell me what do you think?

    http://www.mm.anadolu.edu.tr/~gerek/...erek_cetin.pdf

  18. #18
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well, it is somehow similar to the method i described. I don't seperate any channels though. And they don't mention anything about inter-channel correlation (a source having more than one channel).
    And there's something you should keep in mind - most of these papers don't take into account a bit (more) expensive modelling and coding of the prediction error. For offline processing why not use CM with special contexts along with AC, instead of a simple order 0 AC?

  19. #19
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > For offline processing why not use CM with special contexts along with AC,
    > instead of a simple order 0 AC?
    This is actually what I want to do. Also, note that FastAC (Arithmetic coder+model for ACWAV) is not strong. It has delayed model which calculates probability after passing several bits. This is difference from Shelwien's delayed counters.

  20. #20
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > My main goal is to show that simple haar transform can beat delta transform+your favorite compressor.

    Then test the lossless audio compressors first instead of some random "universal" ones.
    (But you'd have to ensure that its really lossless by decoding and comparing).
    I'd recommend to start with Taylor's rkau and WinRK.
    Also stuffit probably has something along that line.
    And of course things like "monkey's audio".

    > Because, it does a subband decomposition which improve compressing.
    > Notice that, it is very fast.
    > Another thing you can replace your delta transform with an haar transform.

    Wavelet transforms (or FFT) are delta transforms in a way, just with a weird order.

    > Because, it needs compression (actually it's output like BWT, it's only a transform).

    Yeah, so check out the performance of BWT vs CM compressors eg. on maximumcompression.
    Its really similar to CM vs transforms for audio data, you're right.
    But that doesn't serve your "goal" at all.
    Also FFT has some secondary reasons to be used for audio compression (because
    FFT-based quantization had been already applied to most of available audio data,
    and I don't mean only mp3, because how do you think studios mix their production?),
    While the wavelets are just some static linear transformations, so can never beat
    even a simple adaptive LPC... in lossless compression where it can be objectively measured.

    Then again, you're right in the sense that the specialized audio compressor, even
    haar-based, would be able to win over most of universal compressors (except Taylor's,
    probably), but that doesn't mean anything.

    Well, to sum this up:
    1. LPC is a "delta transform"... the prediction of any model can be subtracted from
    the next audio sample.
    2. A sequential audio CM can use any correlations, it can even compute some spectral
    transforms (not necessarily a single one) and use their output or extrapolations as
    a context. But its much less convenient starting with spectral data, because you'd
    have first to reverse the initial transform to get more context, and then its really
    complicated if you'd try to map the predictions of sequential model (or a model for
    other spectral transform) to the probabilities of some values in your chosen spectral
    transform.
    3. The situation is worst in the lossy audio compression area - as there's already
    a _tradition_ that audio processing is performed in FFT terms (well, it filters out
    the wavelet-based codecs too), so mapping the spectral predictions back onto sound
    samples is the only choice, and as I said, its totally complicated to compute, so
    nobody does that.
    Last edited by Shelwien; 10th June 2008 at 01:15.

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by osmanturan View Post
    Aaa...I found a better way (!). Could you look at it and tell me what do you think?

    http://www.mm.anadolu.edu.tr/~gerek/...erek_cetin.pdf
    Interesting. They transmit a hierarchy of images of increasing size, each time transmitting the prediction error from the previous image. But why use enumerative coding instead of arithmetic??? I looked it up (one author is the same): http://citeseer.ist.psu.edu/288910.html

    The problem with enumerative coding is it only works over blocks with uniform statistics and then wastes 1 bit per block (1/2 bit to transmit the number of 1 bits in the block, and 1/2 bit to transmit the permutation), and blocks have to be small for good performance. Their results on page 6 (pages are backwards) show encoding about 10% over the Shannon limit. (Their arithmetic coding results are bogus, should be only a few bits over).

    I suppose their model is equivalent to a non hierarchical model transmitting prediction error using a 2-D adaptive filter with low-pass filtered inputs. But that would not work with enumerative coding because you need context to model the error properly.

    Also, their method is for gray scale images. You could not use redundancy across color planes. For audio you would have a similar problem modeling stereo.

  22. #22
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > Then test the lossless audio compressors first instead of some
    > random "universal" ones. (But you'd have to ensure that its really lossless
    > by decoding and comparing). I'd recommend to start with Taylor's rkau and
    > WinRK. Also stuffit probably has something along that line. And of course
    > things like "monkey's audio".
    Yes, you are right. I'll test WinRK and Stuffit as soon as possible

    > Wavelet transforms (or FFT) are delta transforms in a way, just with a
    > weird order.
    But, improve compression

    Thanks for other things. It's time to think on

  23. #23
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    You could test out SBC, it's always done amazingly well on wav(and bmp) files for me.

    Example:
    08 Ashes and Wine.wav 45,951,068 bytes 16-Bit Stereo
    (taken from the stunning debut album One Cell in the Sea by A Fine Frenzy)

    SBC (-m3 -b63) 26,931,899 bytes (~24.3s)
    acwav 30,238,013 bytes (~4.5s)
    WinRAR(Best) 31,459,198 (~16s) - Default Advanced Compression options
    RAR(-m5 -mdg) 31,459,283 bytes (~41.4s) Why is this so much slower?
    7z(Ultra) 41,533,731 bytes (~48s)

    Times taken using Timer, for WinRAR/7z i just watched the time in the progress window.

    On a slightly related note, SBC gets scribble.wav down to 12,283 bytes from 9,249,480 bytes! I think there was another program that got it a bit smaller, but i forget which.
    Last edited by Intrinsic; 10th June 2008 at 12:35.

  24. #24
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    @matt
    Normally, every year I'm in Istanbul at this time. But, only this summer not. If I were in there, I could learn lots of thing about this algorithm from the authors. Now, I'm 980 km far from there
    Last edited by osmanturan; 10th June 2008 at 12:17.

  25. #25
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Interesting I'll test SBC besides WinRK and Stuffit, right now!

  26. #26
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    WinRK 3.0 build 3 beta (PWCM): 20.091.853 bytes (133 seconds)
    Stuffit Deluxe 11.0.2.55 (Maximum Settings) 37.398.648 bytes (~8 seconds)

    I couldn't access the geocities for SBC because of the filtering

  27. #27
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    SBC for you, http://www.zenadsl5706.zen.co.uk/SBC...%20Rev%202.rar

    The latest is Rev 3, but i think that just had a 64-bit build? The main site is down now as it seems he's switched hosts, but has forgotted to put his site up

  28. #28
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    FLAC 1.2.1 (-: 21.086.928 bytes (~10 seconds)
    Monkey's Audio 4.01 beta 2 (insane: -c5000): 20.237.168 bytes (~43 seconds)
    SBC 0.970 Rev2 (-m3 -b63): 20.574.533 bytes (~24 seconds)

    @Intrinsic:
    Thanks for the link!

  29. #29
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    When we consider total time (compression+transmission+decompression), the winner is ACWAV for me. I haven't mention ACWAV's simplicity yet!

    Note:
    - Monkey's Audio takes care of correlation between channels.
    - FLAC based on LPC
    - SBC based on BWT with filters

  30. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    RKAU 1.07 http://www.free-codecs.com/download_...hp?d=329&s=214
    This is surely something similar to the WinRK audio model, but maybe there're
    more options.

Page 1 of 5 123 ... LastLast

Similar Threads

  1. Lossless Audio Codec
    By encode in forum Forum Archive
    Replies: 8
    Last Post: 1st August 2007, 18:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •