Results 1 to 21 of 21

Thread: BWTS STATUS OF PAPER

  1. #1
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Post BWTS STATUS OF PAPER

    I have much faster code will post it later but here is the draft so far
    for the paper. I suck at writting at least the other guy understands how to write and how the method works.

    I have a copy for a while at my site if you
    wish to take a look

    http://bijective.dogma.net/00yyy.pdf

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    What about bijective Schindler Transform? I'm not into ST so I don't know how it works in detail. I only know it's a limited order BWT.

    With 64-bit processors Schindler Transform on order 7 or 8 will be very fast, and order 7 or 8 Schindler Transform isn't much worse than full order BWT.

    AFAIR Francesco's Rings uses Schindler Transform. Can someone tell me which order Rings use?

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Cool

    Quote Originally Posted by Piotr Tarsa View Post
    What about bijective Schindler Transform? I'm not into ST so I don't know how it works in detail. I only know it's a limited order BWT.

    With 64-bit processors Schindler Transform on order 7 or 8 will be very fast, and order 7 or 8 Schindler Transform isn't much worse than full order BWT.

    AFAIR Francesco's Rings uses Schindler Transform. Can someone tell me which order Rings use?
    First of all the word "bijective" most would argue that the BWT is always bijective. Of course that's the problem its not bijective to the set of all files so the word is used rather loosely. I am not a member of anything most papers are not freely accessible to me. But from the ones I have found when you walk through the text you find most don't really have a transform that is bijective to the set of all files. I don't have code or know enough about the Schindler Transform to tell you if its really bijective.

  4. #4
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    In trying to find info on the Schindler Transform I came across

    ftp://ftp.math.hkbu.edu.hk/pub/techreport/math436.pdf

    In it they go through the example of mississippi with
    the standard $ attached. Where it sorted in a way like BWT$

    This fact alone makes it not bijective because of the added
    symbol. I could not find any information on what you call
    the "bijective Schindler Transform" it you have any let
    me know.

    Most people when talking about BWT no longer do a real
    BWT they do the BWT$ routine. Which is slightly different
    yet that is what is done. The real BWT does a little more
    time to compute not much but more. For some reason
    BWT$ is what most seem to call BWT but it is not the
    same

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    This ST aka Schindler Transform aka Sort Transform
    is actually a partial BWT, and thus is not chained to EOF in any way.
    The idea is that sorting the contexts only by first N symbols
    might be faster than full BWT, and still reversible.
    For a simple example see ST2.inc in http://ctxmodel.net/files/ST2rc/st2rc_v4.rar

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Sorry, David, I've written my post inaccurately (my English isn't very good). I meant that I would know if you are working on bijective Schindler Transform.

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile

    Quote Originally Posted by Shelwien View Post
    This ST aka Schindler Transform aka Sort Transform
    is actually a partial BWT, and thus is not chained to EOF in any way.
    The idea is that sorting the contexts only by first N symbols
    might be faster than full BWT, and still reversible.
    For a simple example see ST2.inc in http://ctxmodel.net/files/ST2rc/st2rc_v4.rar
    I down loaded this it appears to be bijective. However it will take some time to wrap my mind around this. Its not clear exactly what it does. I take this is for a second order. Does his method allow other orders. if some file is say 100 bytes and you attempt to run a order 200 does that mean after order 100 the output file is the same. Take the example in the paper Gil came up with what would an order 100 ST give as an ouput? Would it match my output? Or does the ST method only work for small orders. If its like BWT then at some point you might think it would match the output of the bijective string transform?

    The output of the transform we are writing about is to match a true BWT not the BWT$ that is commonly used. It would be interesting to know more about the ST transform. Thanks for the code example.

  8. #8
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Sorry, David, I've written my post inaccurately (my English isn't very good). I meant that I would know if you are working on bijective Schindler Transform.
    That's OK my English sucks and sadly it the only language I can use when I attempt to communicate with humans. I did not use the ST as a basis of any of the work. Though in some ways we both are attempting to get at same solution. That is an improvement of the true BWT. I feel BWT$ is such a mod that is in such common use people think its actually BWT. The ST sounds interesting. I just wonder what a maxed out order bijective ST does since the example for a second order presented above seems to be bijective. I would need to know more about the ST. I suspect Shelwien could answer that question he seems to know alot about it.

  9. #9
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile

    Quote Originally Posted by Shelwien View Post
    This ST aka Schindler Transform aka Sort Transform
    is actually a partial BWT, and thus is not chained to EOF in any way.
    The idea is that sorting the contexts only by first N symbols
    might be faster than full BWT, and still reversible.
    For a simple example see ST2.inc in http://ctxmodel.net/files/ST2rc/st2rc_v4.rar
    I took a look at your code sadly I did not get it to work under DJGPP
    but I notice that it writes out 2 more bytes in the output array than
    what is in the input array. So though your version does not use the
    EOF as the paper I saw it does add extra data can we say index?
    So it is not bijective since the ouput longer than the input or did
    I miss something??

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Yeah, and I actually made some effort to make it bijective and
    partially succeeded (it could correctly encode/decode english texts),
    but then there were bugs on binaries, and I dropped the idea.
    That is, its possible but very tricky - increment a counter there,
    decrement here, and then a similar but different thing at decoding.

    But the algorithm itself is very simple - we count the frequencies
    of symbol digrams (order-2 contexts), and allocate the blocks
    of corresponding size in the output for storing the symbols in these contexts.

    And on decoding, first we reconstruct the digrams by sorting all
    the symbols and pairing up with ST2 symbols. And after counting
    digrams we'd know where to find the specific context histories.

    But that's there a bijectivity problem appears, as we easily can
    decode the data if we'd always know the context, but what about
    the context for first two symbols of data?

    There I couldn't think of a way to use two last symbols as a context
    for first ones (like with BWT string rotations), so decided to just use
    any two symbols, eg. \x00. And subsequently got entangled in all
    these increments/decrements which appeared because I used up
    some symbols which were not present in the data or just had a
    different frequency.

    But its certainly possible for someone with sufficient patience

  11. #11
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Post

    I got the example to work when I dropped my favorite and
    went to MinGW. I used the example in his Patient


    your code took
    abracadabra

    and it went to
    rabdbacrraaaa

    on reverse to
    abracadabra

    BWTS transforms it to
    ardrcaaaabb

  12. #12
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile

    Quote Originally Posted by biject.bwts View Post
    I got the example to work when I dropped my favorite and
    went to MinGW. I used the example in his Patient


    your code took
    abracadabra

    and it went to
    rabdbacrraaaa

    on reverse to
    abracadabra

    BWTS transforms it to
    ardrcaaaabb

    should have added BWT without index maps it to
    rdarcaaaabb

    note for this example only the difference between
    BWTS and BWT is the where the first a is located
    but you have the added benefit of no index being
    needed

    as for BWT$ well that all depends on the weight of the EOS symbol.
    so I will not post it even though that is what most are really using

  13. #13
    Member
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Paper on bijective variants of the BWT

    I just found this link on comp.compression:

    http://www.informatik.uni-stuttgart....s/kuf09psc.pdf

    It's a paper about the bijective sort transform. The presentation seems to be rather cryptic, but it contains a running example which I will go through -- maybe I will finally understand the (inverse of the) ST...

    @Shelwien: maybe it has the missing "bijectivity part" for your implementation.

    One general question: does the ST with a reasonable order (say 5 or give better or worse compression results (depending on the different encoders)? For example, what about the BWTmix approach on ST outputs?

    --Alibert

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    ST4 vs BWT is like PPM order4 vs PPM*.
    So its worse in most cases.
    And inverse ST is not a problem, its basically the same as forward ST.
    Which is slower than inverse BWT of course.

  15. #15
    Member
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    ST4 vs BWT is like PPM order4 vs PPM*.
    I do not fully agree; of course the order of BWT is higher, but on the downside it scrambles the information within the contexts, whereas the order of the ST is small but for example a pattern 0101010101 typically means that within some context x (which length is the order of the ST) the bits 0 and 1 indeed are alternating in the input. So to me, the questions which one is better very heavily relies on the backend you are using for compression.

    Quote Originally Posted by Shelwien View Post
    And inverse ST is not a problem, its basically the same as forward ST.
    Could you explain in more detail, please? An example? Maybe the one in paper; since I already tried to understand that one? In the paper above some strange graph is constructed and I do understand how the algorithm works from there but I do not fully understand how to construct that graph and why the whole procedure works. The overall construction for the usual ST seems to be more complicated than necessary (I can only suppose that the reason for this is the bijective ST, which might be more complicated).

    --Alibert

  16. #16
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Alibert View Post
    I do not fully agree; of course the order of BWT is higher, but on the downside it scrambles the information within the contexts, whereas the order of the ST is small but for example a pattern 0101010101 typically means that within some context x (which length is the order of the ST) the bits 0 and 1 indeed are alternating in the input. So to me, the questions which one is better very heavily relies on the backend you are using for compression.
    --Alibert
    I can tell you that BWTS 0101010101 goes to 1111100000
    I suspect that the bijective ST gets the same anwser for a very
    small order after which all higher orders of the bijective ST would
    match the bijective BWT.

    Actually compression is a zero sum game. There is no magic compressor
    of course one could construct files that look good or bad with either. The
    trick is finding what is useful for some given set or type of file.

    I suspect the bijective ST is like the bijective BWT in that if you keep
    applying the transform enough times you come back to the starting file.
    This fact alone means that the BWT can not make all files compress better
    some it will help some it will hurt. But I suspect that the full order bijective
    BWT would generally be more useful for most files people construct but for
    certain special files the limited order bijective ST will be better.

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    >> ST4 vs BWT is like PPM order4 vs PPM*.
    >
    > I do not fully agree; of course the order of BWT is
    > higher, but on the downside it scrambles the information
    > within the contexts,

    Well, the specific permutation of symbols within a context
    doesn't matter much for stationary sources (like texts).
    And higher orders usually have higher match rates.

    > So to me, the questions which one is better very heavily
    > relies on the backend you are using for compression.

    Unfortunately the dependence on data type is much stronger.
    And there's basically no "natural" data where the specific order of
    symbol occurences in a context really matters, so BWT is
    always not much worse than ST - even with nonstationary data.

    >> And inverse ST is not a problem, its basically the same as forward ST.
    >
    > Could you explain in more detail, please? An example?

    Here's an example of ST2:
    http://ctxmodel.net/files/ST2rc/st2rc_v4.rar

    Basically you just concatenate the symbols occuring in
    the context ctx into a string s[ctx] and then output these strings
    in a lexicographic context order - that's forward ST.

    And for the inverse ST, you first have to rebuild the context
    index, which is possible because ST1 fragment sizes (number
    of symbol occurences) can be calculated from transformed data,
    then digram occurences can be counted by associating a single
    context symbol with transformed data symbol, then we'd know
    how to associate digram contexts with transform symbols and
    thus be able to count trigrams etc, etc - the same as for BWT
    actually.

    And then, I didn't really think about how to implement
    a "proper" bijective ST with last bytes of data used
    as context for first bytes (cyclic string shifts).
    But its also possible to use some fixed bytes as a context
    for first symbols, however that involves some tricky workarounds
    to compensate for this "virtual" context in various numbers
    of occurences.

    > The overall construction for the usual ST seems to be more
    > complicated than necessary

    Forward ST is less complicated than BWT, as its the same
    radix sort which can be used for BWT, but with reduced
    number of steps.
    But inverse ST requires repeating whole sorting process,
    while with BWT we just assume that a i-th occurence of a symbol
    in BWT corresponds to i-th occurence of the same symbol in
    lexicographically sorted o1 contexts.

  18. #18
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Shelwien View Post
    >> ST4 vs BWT is like PPM order4 vs PPM*.

    Forward ST is less complicated than BWT, as its the same
    radix sort which can be used for BWT, but with reduced
    number of steps.
    But inverse ST requires repeating whole sorting process,
    while with BWT we just assume that a i-th occurence of a symbol
    in BWT corresponds to i-th occurence of the same symbol in
    lexicographically sorted o1 contexts.
    I am not sure that forward ST is less complicated than BWT
    is there a fast method for the ST compared to say the SKEW
    suffix sort for normal BWT which is linear in time and space?

    The skew sort is a natural for BWT$ which is what people who
    so they are using BWT are actually using most of the time.
    The skew sort can be modifed to make a true BWT and yes
    stay linear in time and space.

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Isn't ST always linear because of its limited order?

    Though I don't really see much sense in discussing ST,
    as even BWT is barely useful.

  20. #20
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Shelwien View Post
    Isn't ST always linear because of its limited order?

    Though I don't really see much sense in discussing ST,
    as even BWT is barely useful.
    I don't even think sorting a list of n numbers is always linear since
    there are so many ways to do it. But of course you could pick a
    linear or even n^2 way to sort a simple list of numbers.

    And yes ST and BWT are barely useful. So what they are sill always
    going to fill certain compression niches. I frankly see more use
    in encryption for the Bijective BWT especially the binary versions
    than I do in compression. But its the differing views of people that
    makes the world worth living in. There is no magic compression
    that is useful for everything but it is fun trying new methods

  21. #21
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Alibert View Post
    I just found this link on comp.compression:

    http://www.informatik.uni-stuttgart....s/kuf09psc.pdf

    It's a paper about the bijective sort transform. The presentation seems to be rather cryptic, but it contains a running example which I will go through -- maybe I will finally understand the (inverse of the) ST...

    @Shelwien: maybe it has the missing "bijectivity part" for your implementation.

    One general question: does the ST with a reasonable order (say 5 or give better or worse compression results (depending on the different encoders)? For example, what about the BWTmix approach on ST outputs?

    --Alibert
    The paper above made it to the conference in Prague.

    The paper I and Gil was working has been rejected.
    I really am not surprised that it was rejected. For
    some reason I was thinking about the myth of global
    warming the CO2 scare that the elite in my country
    are pushing. The Chinese are not as dumb as my
    government. I also remember the Czech President
    puttin down Al Gore. So with those thoughts in mind
    I wrote this
    To: ... the soda place
    Subject: Re: [SODA10] Rejected paper #107 "A Bijective String Sorting Transform"
    Date: Fri, 4 Sep 2009 10:33:39 -0700

    Show Full Headers Back To [SENT]

    That's ok the conference in the Czech Republic
    recognized the usefulness of a Bijective version of the BWT
    I really did not expect it to be recognized in this country
    where research is more a function of being politically correct
    than having real value. Just look at the phony science backing
    the myth of global warming. What can you expect from a country
    that thinks CO2 needs to be carefully controlled as to dangerous
    for the environment.

    Thank You
    David Scott
    Yes the paper needs polishing so what. I think Manfred changed
    his several times during the back and forth of getting it published.
    I am not sure what my plans are at this point or Gils plans.

    Bye the way are the Russians caught up in this phony CO2 cabon
    thing?

Similar Threads

  1. BWTS GENERAL COMPRESS in MinGW exe's
    By biject.bwts in forum Data Compression
    Replies: 18
    Last Post: 12th October 2010, 21:27
  2. USING BWTS in BWTmix
    By biject.bwts in forum Data Compression
    Replies: 8
    Last Post: 9th March 2010, 03:00
  3. BWTS explanation?
    By TopQuark in forum Data Compression
    Replies: 5
    Last Post: 8th April 2009, 23:26
  4. NEW FULL BWTS COMPRESSOR
    By biject.bwts in forum Data Compression
    Replies: 2
    Last Post: 20th January 2009, 21:10
  5. Original paper on LZMW
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 9th February 2008, 14:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •