Results 1 to 4 of 4

Thread: Alphabet Reordering for BWT

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts

    Alphabet Reordering for BWT

    Here I've got some nice improvement from reordering the alphabet before BWT.
    208107 on book1 with incomplete optimization.

    http://ctxmodel.net/files/BWT_reorder_v1.rar

    First, I tried to use some heuristic approach to compute the alphabet
    permutation, but that failed miserably. (http://ctxmodel.net/files/BWT_reorder_v0.rar)

    And then I remembered that specific alphabet not only affects the BWT,
    but bitwise CM as well (postcoder), and just ran a usual optimizer with
    2kbit permutation table and postcoder parameters, and results are
    rather nice.

    Of course its highly impractical to optimize the reordering for each block
    or file, as it takes multiple hours, but there's not much sense in that
    either probably, as it should be possible to precalculate a few
    permutation tables beforehand, and then select one based on eg.
    o1 frequency table analysis.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    http://ctxmodel.net/files/BWT_reorder_v2.rar

    Reoptimized for enwik6

    Code:
           v1+v1  v1+v2  v2+v1  v2+v2
    book1  210357 210679 210357 210679 // no reordering + v1/v2 rc
    enwik6 255127 254599 255127 254599 
    book1  208107 208443 208657 208978 // v1/v2 reordering + v1/v2 rc
    enwik6 255193 254619 251617 251075
    
    v1 was tuned for book1
    v2 was tuned for enwik6 (1000000 bytes of enwik)

  3. #3
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Nice stuff. Shelwien, thanks !
    Just made a small test with Reorder + BBB on enwik8
    BBB's command line is cfm100 and Reorder is from v2

    Code:
              BBB        20 847 290
    Reorder + BBB        20 775 052
    Nice results here. And another test for nkoef.dbf database from WCMETA program.
    Original size is 151 585 377 bytes. BBB used with cm145

    Code:
              BBB        1 964 340
    Reorder + BBB        1 971 925
    Bad here. And one more test on AllWords.txt file (26 958 014 bytes) with BBB cfm32

    Code:
              BBB        11 583 720
    Reorder + BBB        11 571 629

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    Tried optimizing the permutation specifically for bbb,
    but apparently there's not much difference: 20772553.
    Attached Files Attached Files

Similar Threads

  1. NanoZip - a new archiver, using bwt, lz, cm, etc...
    By Sami in forum Data Compression
    Replies: 280
    Last Post: 29th November 2015, 11:46
  2. BWT with compressed input data
    By Shelwien in forum Data Compression
    Replies: 3
    Last Post: 29th May 2009, 16:16
  3. Modern BWT speed
    By GerryB in forum Data Compression
    Replies: 4
    Last Post: 5th May 2009, 18:28
  4. BWT - how to use?
    By m^2 in forum Data Compression
    Replies: 29
    Last Post: 6th November 2008, 03:01
  5. BWT/ST compression
    By encode in forum Data Compression
    Replies: 60
    Last Post: 25th June 2008, 08:21

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •