Results 1 to 7 of 7

Thread: A new transform

  1. #1
    Member
    Join Date
    Jan 2013
    Location
    Israel
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    A new transform

    Hi,

    I developed a new transform that can compute, precisely, the frequency components of equally spaced samples. It derives the frequencies from the data, unlike DCT which uses a predefined set of frequencies.
    It is particularly relevant to audio compression, lossy or lossless.

    A brief summary, in a doc file, is in this link: http://www.mediafire.com/view/?6eqzy6ld7ax7saz

    those who want the full details and/or to evaluate a test program, can contact me using the email address in the doc file.

    Regards,
    Yaakov Gringeler

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    302
    Thanked 1,328 Times in 759 Posts
    Once upon a time I tried to expand audio data into a series of sin(w*x+d), using LMS
    optimization by w. FFT basically does the same thing for a fixed set of frequencies.
    So it worked, and was able to detect parameters of a single sinus wave, which looked cool.
    But then I tried to implement compression using it, and it turned out that real files are not made from
    such simple functions - there was some compression, but results were worse than rar.

    Basically, the audio signal is not mixed from such fixed functions, and there's noise and inter-channel
    correlation, so afaik whatever nice serial approximation doesn't help much for lossless compression.

    As an example, here's a lossless MDCT-based spectral coder.
    http://nishi.dreamhosters.com/u/fast08sh4.rar
    (it doesn't store the header nor precise wav length, so not 100% lossless)

    Hopefully your results can prove that I was wrong somewhere? :)

  3. #3
    Member
    Join Date
    Jan 2013
    Location
    Israel
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Once upon a time I tried to expand audio data into a series of sin(w*x+d), using LMS
    optimization by w. FFT basically does the same thing for a fixed set of frequencies.
    So it worked, and was able to detect parameters of a single sinus wave, which looked cool.
    But then I tried to implement compression using it, and it turned out that real files are not made from
    such simple functions - there was some compression, but results were worse than rar.

    Basically, the audio signal is not mixed from such fixed functions, and there's noise and inter-channel
    correlation, so afaik whatever nice serial approximation doesn't help much for lossless compression.

    As an example, here's a lossless MDCT-based spectral coder.
    http://nishi.dreamhosters.com/u/fast08sh4.rar
    (it doesn't store the header nor precise wav length, so not 100% lossless)

    Hopefully your results can prove that I was wrong somewhere?

    The model which you used differs from my model, which is the ESF (Exponential Sum
    Function). An ESF can have any number of exponential bases, which are derived from the
    data and do not have to be frequencies. Therefore it also differs from FFT/MDCT.

    The suitability of an ESF to model audio can be assessed in the following way.

    There is a method used in audio called linear prediction. Basically, it means to predict a
    sample using a linear combination of previous samples. This method is standard in speech compression, and from what I could tell, also a common method for lossless compression of music.
    If you use the correct number of samples, linear prediction becomes linear determination.
    The sample is exactly a linear combination of previous samples. You need for that the same
    number of linear coefficients as the number of samples which are predicted/determined.

    The mathematics behind my transform proves that the linear determination coefficients are
    equivalent to the set of exponential bases of an ESF.
    Therefore, the modeling power of an ESF is the same as linear prediction.

    Linear prediction, and thus ESF, has been clearly assessed as suitable for modeling
    speech, and AFAIK, the same goes for lossless audio compression, including that of music.

    I do not know of a use of linear prediction in lossy compression of music. I guess it's
    because you are very limited when all you have are linear coefficients. You cannot remove
    frequencies outside the range of human hearing, frequencies with small coefficients, and
    so on.
    You can easily do all that with an ESF.

    Assuming that linear prediction has been successfully used for the lossless compression of
    music, it means that an ESF is suitable for lossy compression of music.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    302
    Thanked 1,328 Times in 759 Posts
    > Therefore it also differs from FFT/MDCT.

    But its still a linear combination.
    Here'a a single piano key - the easiest thing to approximate
    with spectral methods:
    http://nishi.dreamhosters.com/u/sdr0017a.png

    But there's also an envelope - how would you deal with it?

    You started talking about "a new transform", so I tried
    to explain why I think that it won't be of much help anyway.

    However using it to predict new samples is completely different.
    In that case it seems reasonable that you can get better results
    with a more complex approximation function.

    What if you'd take a .wav, subtract predictions from it using your method,
    and compress both files (original and residuals) with some lossless coder (optimfrog?).

    > it means that an ESF is suitable for lossy compression of music.

    There're two common methods of audio coding - either a prediction
    is subtracted from each sample and residuals are encoded, or
    a frame is transformed into coefs, and coefs are encoded.

    Of course, you can do both with your method, but atm I don't
    see why it would be better than FFT in case with coef encoding.
    As well, I don't see how sequential prediction can be combined
    with lossy quantization - maybe blockwise quantization first,
    then sequential prediction, but then the same can be done
    with FFT+LPC or something, and I doubt that results would be any good.

    Anyway, do you have any numbers?
    For now, the only thing that I understand is that its complexity
    is much higher than FFT's O(N*log(N)).

  5. #5
    Member
    Join Date
    Jan 2013
    Location
    Israel
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    First, my method is not a linear prediction method.
    The method converts samples into an ESF which is a sum of sinusoids that can increase or
    decrease exponentially, with the frequencies derived from the data.
    What I pointed out was the mathematical equivalence between linear
    prediction/determination coefficients and the set of exponential bases of an ESF. One can
    be derived from the other.

    > But there's also an envelope - how would you deal with it?
    I'm not sure what you mean by envelope.

    > Of course, you can do both with your method, but atm I don't
    > see why it would be better than FFT in case with coef encoding.

    Because it provides better modeling of the data, and does so by computing the spectral
    components of the signal.

    I regard an audio signal as being composed of a limited, relatively small, number of
    spectral components.

    For such a signal FFT is an approximation. It is an approximation even if the signal is
    composed of sinusoids. Improving the approximation requires increasing the size of the
    transform. This requires more samples and more computations.

    > Anyway, do you have any numbers?

    No, because the transform was not originally developed for compression.
    A main purpose of the transform is to better model data, and audio data in particular,
    with compression being an obvious application. The trigger for the development was working
    on audio resampling.

    You may also note that the transform is a method of spectrum analysis.

    > For now, the only thing that I understand is that its complexity
    > is much higher than FFT's O(N*log(N)).

    I am afraid you misunderstood that.
    The size N in the complexity of FFT and the size N in the complexity of my transform are
    different things. In the case of FFT, the complexity is determined by the
    transform size, and is a measure of the accuracy. A higher value means more accuracy.

    In the case of my transform, complexity is determined by the number of spectral components
    in the spectrum of the signal.

    Accuracy, the determining factor in the complexity of FFT, is a non-issue in my transform.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    302
    Thanked 1,328 Times in 759 Posts
    > First, my method is not a linear prediction method.

    I said that its a linear combination.

    >> But there's also an envelope - how would you deal with it?
    > I'm not sure what you mean by envelope.

    http://en.wikipedia.org/wiki/Envelope_(waves)

    I guess support for exponential envelope is better than nothing,
    but they're normally not exponential unfortunately.

    >> atm I don't see why it would be better than FFT in case with coef encoding.

    > Because it provides better modeling of the data, and does so by
    > computing the spectral components of the signal.

    I agree that it is better for some kinds of data, but actual audio data
    (music etc) consists of noise and different kinds of functions.
    So a precise approximation with ESF coefs would be likely still more
    redundant than usual lossless methods, and a lossy approximation
    might be inferior to FFT methods due to psychoacoustics:

    "The hair cells in the organ of Corti are tuned to certain sound
    frequencies by way of their location in the cochlea"
    ( http://en.wikipedia.org/wiki/Cochlea )

    Basically, sound receptors in humans react to specific frequencies
    (though not necessarily FFT ones), so even a precise recovery
    of a function used to generate the input signal isn't necessarily
    the best solution for lossy audio compression.
    And ESF isn't really the perfect approximation of actual generator
    function.

    >> Anyway, do you have any numbers?
    > No, because the transform was not originally developed for compression.

    Ok, please send your test program to shelwien@gmail.com and I'd try
    to check whether its better than LPC for lossless compression or not.

    > A main purpose of the transform is to better model data, and audio
    > data in particular, with compression being an obvious application.

    I agree that it should be a useful approximation method in general,
    but I'm not so sure about compression.
    That is, it may be good as a LPC replacement for lossless audio
    compression, but then the question is whether its fast enough for that.

    > The trigger for the development was working on audio resampling.

    Yeah, in this case I see how it can be useful.
    Though I'd try to capture more of possible patterns -
    http://en.wikipedia.org/wiki/Non-sinusoidal_waveform

    > In the case of FFT, the complexity is determined by the transform
    > size, and is a measure of the accuracy. A higher value means more
    > accuracy.

    There're also sparse FFT implementations - http://groups.csail.mit.edu/netmit/sFFT/
    (I guess mainly useful for iFFT with lots of coefs quantized to zero).

    > In the case of my transform, complexity is determined by the number
    > of spectral components in the spectrum of the signal.

    Yes, but I don't see why you think that the number of components
    would be limited for musical audio files.

  7. #7
    Member
    Join Date
    Jan 2013
    Location
    Israel
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    > redundant than usual lossless methods, and a lossy approximation
    > might be inferior to FFT methods due to psychoacoustics:

    Psychoacoustics will work better with my method because it is more accurate.


    > And ESF isn't really the perfect approximation of actual generator
    > function.

    Of course not. It's a better approximation, not a perfect one.


    > That is, it may be good as a LPC replacement for lossless audio
    > compression, but then the question is whether its fast enough for that.

    Speed is approximately the same as LPC.

Similar Threads

  1. New Fast Fourier Transform Algorithm
    By Cyan in forum Data Compression
    Replies: 2
    Last Post: 21st January 2012, 22:41
  2. Schindler Transform (STX)
    By CUDALIKE in forum Data Compression
    Replies: 15
    Last Post: 29th November 2011, 00:40
  3. SCOTT TRANSFORM
    By biject.bwts in forum Data Compression
    Replies: 34
    Last Post: 14th August 2011, 06:26
  4. a very simple transform for english.dic
    By willvarfar in forum Data Compression
    Replies: 8
    Last Post: 1st March 2010, 15:44

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •