Results 1 to 12 of 12

Thread: OpenBWT-2.0.0 development

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts

    OpenBWT-2.0.0 development

    Hi, I am new to this forum and a novice to compression in general. But I need a BWT related compression tool, and was pointed to OpenBWT-2.0.0 package.

    My questions:

    1) Any new developments in this package since 2010? What is the relation between DivSufSort and OpenBWT 2.0? Is DivSufSort a superset of OpenBWT 2.0?

    2) In the OpenBWT 2.0, there is a directory called sst. What is sst? I guess the source codes contained there are for post processing methods after BWT. Are there any simple example codes to use these methods? And also are there published experimental results of using these methods, such as compression ratio, memory use and processing time using these methods together with RLE and entropy coder?

    Many thanks for any help.

  2. #2
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    165
    Thanks
    31
    Thanked 64 Times in 40 Posts
    1) The relation is nothing. It's based on Induced Suffix Sorting.
    2) sst is second stage transform. v1.3 has example codes.
    Last edited by xezz; 21st October 2017 at 09:58.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    1) divsufsort was updated more recently: https://github.com/y-256/libdivsufsort
    I guess divsufsort is a normal library, while openbwt is more of a development/benchmark kit.

    2) SST is "second stage transform", like MTF. There's also sst_test.c for their comparison.
    Most of these methods are not actually used anywhere.

    3) You can also look at
    https://github.com/michaelmaniscalco/msufsort
    https://github.com/IlyaGrebnov/libbsc

  4. #4
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Thank you so much for the info

  5. #5
    Member
    Join Date
    Mar 2015
    Location
    Germany
    Posts
    57
    Thanks
    34
    Thanked 38 Times in 15 Posts
    I made some improvements on libdivsufsort a while ago which speeds up the second stage:

    https://github.com/akamiru/libdivsufsort

  6. #6
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Thx, will try out.

    Quote Originally Posted by Christoph Diegelmann View Post
    I made some improvements on libdivsufsort a while ago which speeds up the second stage:

    https://github.com/akamiru/libdivsufsort

  7. #7
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Any technical writings to explain how to use these SSTs used as examples in OpenBWT-1.5 ?


    Quote Originally Posted by Shelwien View Post
    1) divsufsort was updated more recently: https://github.com/y-256/libdivsufsort
    I guess divsufsort is a normal library, while openbwt is more of a development/benchmark kit.

    2) SST is "second stage transform", like MTF. There's also sst_test.c for their comparison.
    Most of these methods are not actually used anywhere.

    3) You can also look at
    https://github.com/michaelmaniscalco/msufsort
    https://github.com/IlyaGrebnov/libbsc

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts

  9. Thanks:

    smjohn1 (6th November 2017)

  10. #9
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    141
    Thanks
    26
    Thanked 94 Times in 31 Posts
    To expand upon Eugene's answer, 'second stage tranforms' are post BWT transforms intended to make the data more compressible after the BWT. That is, they are used like this:
    BWT -> SST -> entropy coder. However, these secondary transforms are generally not as strong as direct encoding of the BWT itself and most modern BWT compressors avoid using any secondary transform.

    Some people use a pre-BWT transform as well to reduce the long matches encountered during the BWT but this is completely unnecessary as modern suffix array algorithms no longer have problems with long substring matches. These 'pre-BWT' transforms are usually LZP and are intended not to compress the pre-BWT data but to encode any really long substring matches.

    Examples of excellent modern compressors which encode the BWT directly:

    BCM - encodes the BWT data directly using context mixing (Muravyov)
    BWTMix - another context mixing BWT coder (Shelwein)
    M99 - encodes the actions needed to sort the symbols of the BWT using bit packing (Maniscalco)
    M03 - encodes the BWT data based on the contexts of the pre-BWT data (Maniscalco)

    Examples of modern BWT compressors using post BWT transform:

    BSC - uses QLFC (Grebnov)
    - probably several other's that aren't coming to mind at the moment)

    The key take away is that you don't really need post BWT transforms to effectively and efficiently encode the BWT.

    - Michael

  11. Thanks (2):

    encode (6th November 2017),smjohn1 (6th November 2017)

  12. #10
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    thanks for the info.

    Anyone tried BWT (with or without SST) + ANS/FSE as FSE is a good and fast entropy encoder?

  13. #11
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    158
    Thanks
    51
    Thanked 44 Times in 33 Posts
    Quote Originally Posted by smjohn1 View Post
    thanks for the info.

    Anyone tried BWT (with or without SST) + ANS/FSE as FSE is a good and fast entropy encoder?
    Take a look here: https://github.com/flanglet/kanzi

    Compression happens in 2 steps: transform, then entropy and you can specify both. For instance you can select BWT for stage 1 and ANS0 for stage 2. Or you can specify a level the pre-defines stage 1 and stage 2. EG. level=2 uses BWT, RANK (as SST), Zero RunLength, then ANS0.

    The help option contains the list of implemented transforms and entropy coders.

  14. Thanks (2):

    Bulat Ziganshin (14th February 2018),Shelwien (7th November 2017)

  15. #12
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Thanks!

    Quote Originally Posted by hexagone View Post
    Take a look here: https://github.com/flanglet/kanzi

    Compression happens in 2 steps: transform, then entropy and you can specify both. For instance you can select BWT for stage 1 and ANS0 for stage 2. Or you can specify a level the pre-defines stage 1 and stage 2. EG. level=2 uses BWT, RANK (as SST), Zero RunLength, then ANS0.

    The help option contains the list of implemented transforms and entropy coders.

Similar Threads

  1. New CM compressor in development
    By Mat Chartier in forum Data Compression
    Replies: 37
    Last Post: 28th June 2013, 07:16
  2. Demixer - new tree-based bitwise CM codec is in development
    By Piotr Tarsa in forum Data Compression
    Replies: 34
    Last Post: 17th March 2013, 20:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •