Results 1 to 17 of 17

Thread: BWTS help requested.

  1. #1
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts

    BWTS help requested.

    I am experimenting with BWTS.cpp ( BWT Scott-ified).
    I love iterative encoding/decoding and BWTS is clean with no overhead however I am simply merging Scott's code into my latest experiment and I'm needing some clarity on BLOCK_SIZE and "buffer' dimension. What does BLOCK_SIZE mean? Is it a maximum of what data it sorts? Can it be larger than 1000000 bytes? Can buffer[] be the size of the source file?

    I did see some search results of for some newer efforts (BWTS) are using a larger block size and here I am getting a segmentation error on run after changing BLOCK_SIZE to 4000000 and pairing the "buffer[]" size to the file-size so I am not sure I am understanding the relationship of buffer[] and BLOCK_SIZE

    I am trying to convert the BWTS.cpp to a function but perhaps there is something better now that is already a function???
    That would be great! I lack advanced understanding of BWTS I admit

    Any advice?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,306 Times in 742 Posts

  3. #3
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    I am a humble experimenter so I ask what those links signify.
    There is a lack of comment for such a humble experimenter as I.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,306 Times in 742 Posts
    Its a compact implementation of a bwts forward/inverse transform utility which can process whole files "larger than 1000000 bytes", eg. its tested with enwik8.

    Also here's a x64 executable for it, use "bwts c input output" for bwts and "bwts d input output" for inverse transform.
    It should be able to handle files up to 4G.
    Attached Files Attached Files

  5. #5
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Ah,...

    You are reading me correctly.

    Of course I wanted a plug and play solution but life is never that is it?
    Thank you.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,306 Times in 742 Posts
    You can also look at http://compressionratings.com/i_bwtmix.html for a complete bwt/bwts-based compressor

  7. #7
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Code wise I am wanting a function call level BWTS()

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,306 Times in 742 Posts
    That's why i posted that bwts.cpp - it has this function, just that its called obwt_bwts(), because its taken from openbwt library.

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    I love iterative encoding/decoding and BWTS is clean with no overhead however I am simply merging Scott's code into my latest experiment and I'm needing some clarity on BLOCK_SIZE and "buffer' dimension. What does BLOCK_SIZE mean? Is it a maximum of what data it sorts? Can it be larger than 1000000 bytes? Can buffer[] be the size of the source file?
    BWT (and variants) are block based algorithms, meaning that you split the file into independent blocks and transform them separately. For stationary sources the bigger the block size the better, but you need to take into account that too big block size will result in memory allocation failure.

    Assuming that your file is 500 bytes long and your block size is 200 bytes then you get 3 blocks of following sizes: 200 bytes, 200 bytes, 100 bytes. Last block is smaller than the previous ones - it's normal in BWT as the set block size is in fact block size limit. It's best to have only the last block of variable size - this way you don't need to encode the block sizes, only the block size limit (in case it's not fixed).

  11. #11
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    You guys are wonderful.
    Indeed I see those functions clearly now. I am now working that code into my experiment.

    I was tired and frazzled trying to get things to work so it was best I rested

    Okay I have that code working.

    One question of C++ is the new operator in the case of new byte[n]; where byte is type cast unsigned char the same as calloc() meaning it simply assigns a zeroed out block of memory?
    I have not learned C++ yet so I massaged that C++ code to run under C

    Last edited by Ernst; 21st August 2016 at 04:08.

  12. #12
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    BWT (and variants) are block based algorithms, meaning that you split the file into independent blocks and transform them separately. For stationary sources the bigger the block size the better, but you need to take into account that too big block size will result in memory allocation failure.

    Assuming that your file is 500 bytes long and your block size is 200 bytes then you get 3 blocks of following sizes: 200 bytes, 200 bytes, 100 bytes. Last block is smaller than the previous ones - it's normal in BWT as the set block size is in fact block size limit. It's best to have only the last block of variable size - this way you don't need to encode the block sizes, only the block size limit (in case it's not fixed).
    Cool.

    I was wondering if a block size of the file would do more or better on the sort or it doesn't make much of a difference in reality?
    I work mostly on binary transforms so it's best to ask.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,306 Times in 742 Posts
    1. In this case, new is equivalent of malloc, not calloc. See http://www.cplusplus.com/reference/new/operator%20new/
    2. On x64 with swap enabled, there's no real limit to the block size.
    3. Larger blocks mean better compression, but transformation is slower and uses more memory
    (bwts.cpp implementation seems to use 6*blocksize for forward transform, and 7*blocksize for inverse one)

  14. #14
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    1. In this case, new is equivalent of malloc, not calloc. See http://www.cplusplus.com/reference/new/operator%20new/
    2. On x64 with swap enabled, there's no real limit to the block size.
    3. Larger blocks mean better compression, but transformation is slower and uses more memory
    (bwts.cpp implementation seems to use 6*blocksize for forward transform, and 7*blocksize for inverse one)

    Thanks. In the case of new[] I would think calloc won't hurt. Malloc is the same as I expect in C by int Array[] then.
    I was successful in using the Code you recommended. The experimental transform is running. It's a bijective process. So far so good.

  15. #15
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    141
    Thanks
    26
    Thanked 94 Times in 31 Posts
    I was wondering why David has not yet chimed in on this thread. He was never one to miss out on discussing his work. So I looked on a hunch and sadly it appears that he had passed away earlier this year. I just thought I would mention it since he was a long time contributor to the group and, as far as I can tell, his passing went unnoticed here.

    http://news-ridgecrest.com/news/story.pl?id=0000005125

  16. Thanks:

    nburns (8th January 2017)

  17. #16
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    83
    Thanks
    548
    Thanked 27 Times in 19 Posts

    Unhappy

    Quote Originally Posted by michael maniscalco View Post
    I was wondering why David has not yet chimed in on this thread. He was never one to miss out on discussing his work. So I looked on a hunch and sadly it appears that he had passed away earlier this year. I just thought I would mention it since he was a long time contributor to the group and, as far as I can tell, his passing went unnoticed here.

    http://news-ridgecrest.com/news/story.pl?id=0000005125
    Sad news.

  18. #17
    Member
    Join Date
    Aug 2016
    Location
    Turlock California
    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts
    R.I.P. David Scott.

Similar Threads

  1. mk-bwts
    By nburns in forum Download Area
    Replies: 20
    Last Post: 15th November 2014, 10:20
  2. BWTS paper
    By willvarfar in forum Data Compression
    Replies: 2
    Last Post: 19th January 2012, 13:06
  3. Question about BWTS
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 1st May 2011, 22:01
  4. USING BWTS in BWTmix
    By biject.bwts in forum Data Compression
    Replies: 8
    Last Post: 9th March 2010, 02:00
  5. BWTS explanation?
    By TopQuark in forum Data Compression
    Replies: 5
    Last Post: 8th April 2009, 22:26

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •