Results 1 to 21 of 21

Thread: BWTS STATUS OF PAPER

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Post BWTS STATUS OF PAPER

    I have much faster code will post it later but here is the draft so far
    for the paper. I suck at writting at least the other guy understands how to write and how the method works.

    I have a copy for a while at my site if you
    wish to take a look

    http://bijective.dogma.net/00yyy.pdf

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,505
    Thanks
    26
    Thanked 136 Times in 104 Posts
    What about bijective Schindler Transform? I'm not into ST so I don't know how it works in detail. I only know it's a limited order BWT.

    With 64-bit processors Schindler Transform on order 7 or 8 will be very fast, and order 7 or 8 Schindler Transform isn't much worse than full order BWT.

    AFAIR Francesco's Rings uses Schindler Transform. Can someone tell me which order Rings use?

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Cool

    Quote Originally Posted by Piotr Tarsa View Post
    What about bijective Schindler Transform? I'm not into ST so I don't know how it works in detail. I only know it's a limited order BWT.

    With 64-bit processors Schindler Transform on order 7 or 8 will be very fast, and order 7 or 8 Schindler Transform isn't much worse than full order BWT.

    AFAIR Francesco's Rings uses Schindler Transform. Can someone tell me which order Rings use?
    First of all the word "bijective" most would argue that the BWT is always bijective. Of course that's the problem its not bijective to the set of all files so the word is used rather loosely. I am not a member of anything most papers are not freely accessible to me. But from the ones I have found when you walk through the text you find most don't really have a transform that is bijective to the set of all files. I don't have code or know enough about the Schindler Transform to tell you if its really bijective.

  4. #4
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    In trying to find info on the Schindler Transform I came across

    ftp://ftp.math.hkbu.edu.hk/pub/techreport/math436.pdf

    In it they go through the example of mississippi with
    the standard $ attached. Where it sorted in a way like BWT$

    This fact alone makes it not bijective because of the added
    symbol. I could not find any information on what you call
    the "bijective Schindler Transform" it you have any let
    me know.

    Most people when talking about BWT no longer do a real
    BWT they do the BWT$ routine. Which is slightly different
    yet that is what is done. The real BWT does a little more
    time to compute not much but more. For some reason
    BWT$ is what most seem to call BWT but it is not the
    same

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,139
    Thanks
    321
    Thanked 1,402 Times in 804 Posts
    This ST aka Schindler Transform aka Sort Transform
    is actually a partial BWT, and thus is not chained to EOF in any way.
    The idea is that sorting the contexts only by first N symbols
    might be faster than full BWT, and still reversible.
    For a simple example see ST2.inc in http://ctxmodel.net/files/ST2rc/st2rc_v4.rar

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,505
    Thanks
    26
    Thanked 136 Times in 104 Posts
    Sorry, David, I've written my post inaccurately (my English isn't very good). I meant that I would know if you are working on bijective Schindler Transform.

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile

    Quote Originally Posted by Shelwien View Post
    This ST aka Schindler Transform aka Sort Transform
    is actually a partial BWT, and thus is not chained to EOF in any way.
    The idea is that sorting the contexts only by first N symbols
    might be faster than full BWT, and still reversible.
    For a simple example see ST2.inc in http://ctxmodel.net/files/ST2rc/st2rc_v4.rar
    I down loaded this it appears to be bijective. However it will take some time to wrap my mind around this. Its not clear exactly what it does. I take this is for a second order. Does his method allow other orders. if some file is say 100 bytes and you attempt to run a order 200 does that mean after order 100 the output file is the same. Take the example in the paper Gil came up with what would an order 100 ST give as an ouput? Would it match my output? Or does the ST method only work for small orders. If its like BWT then at some point you might think it would match the output of the bijective string transform?

    The output of the transform we are writing about is to match a true BWT not the BWT$ that is commonly used. It would be interesting to know more about the ST transform. Thanks for the code example.

  8. #8
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile

    Quote Originally Posted by Shelwien View Post
    This ST aka Schindler Transform aka Sort Transform
    is actually a partial BWT, and thus is not chained to EOF in any way.
    The idea is that sorting the contexts only by first N symbols
    might be faster than full BWT, and still reversible.
    For a simple example see ST2.inc in http://ctxmodel.net/files/ST2rc/st2rc_v4.rar
    I took a look at your code sadly I did not get it to work under DJGPP
    but I notice that it writes out 2 more bytes in the output array than
    what is in the input array. So though your version does not use the
    EOF as the paper I saw it does add extra data can we say index?
    So it is not bijective since the ouput longer than the input or did
    I miss something??

  9. #9
    Member
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Paper on bijective variants of the BWT

    I just found this link on comp.compression:

    http://www.informatik.uni-stuttgart....s/kuf09psc.pdf

    It's a paper about the bijective sort transform. The presentation seems to be rather cryptic, but it contains a running example which I will go through -- maybe I will finally understand the (inverse of the) ST...

    @Shelwien: maybe it has the missing "bijectivity part" for your implementation.

    One general question: does the ST with a reasonable order (say 5 or give better or worse compression results (depending on the different encoders)? For example, what about the BWTmix approach on ST outputs?

    --Alibert

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,139
    Thanks
    321
    Thanked 1,402 Times in 804 Posts
    ST4 vs BWT is like PPM order4 vs PPM*.
    So its worse in most cases.
    And inverse ST is not a problem, its basically the same as forward ST.
    Which is slower than inverse BWT of course.

  11. #11
    Member
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    ST4 vs BWT is like PPM order4 vs PPM*.
    I do not fully agree; of course the order of BWT is higher, but on the downside it scrambles the information within the contexts, whereas the order of the ST is small but for example a pattern 0101010101 typically means that within some context x (which length is the order of the ST) the bits 0 and 1 indeed are alternating in the input. So to me, the questions which one is better very heavily relies on the backend you are using for compression.

    Quote Originally Posted by Shelwien View Post
    And inverse ST is not a problem, its basically the same as forward ST.
    Could you explain in more detail, please? An example? Maybe the one in paper; since I already tried to understand that one? In the paper above some strange graph is constructed and I do understand how the algorithm works from there but I do not fully understand how to construct that graph and why the whole procedure works. The overall construction for the usual ST seems to be more complicated than necessary (I can only suppose that the reason for this is the bijective ST, which might be more complicated).

    --Alibert

  12. #12
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Alibert View Post
    I do not fully agree; of course the order of BWT is higher, but on the downside it scrambles the information within the contexts, whereas the order of the ST is small but for example a pattern 0101010101 typically means that within some context x (which length is the order of the ST) the bits 0 and 1 indeed are alternating in the input. So to me, the questions which one is better very heavily relies on the backend you are using for compression.
    --Alibert
    I can tell you that BWTS 0101010101 goes to 1111100000
    I suspect that the bijective ST gets the same anwser for a very
    small order after which all higher orders of the bijective ST would
    match the bijective BWT.

    Actually compression is a zero sum game. There is no magic compressor
    of course one could construct files that look good or bad with either. The
    trick is finding what is useful for some given set or type of file.

    I suspect the bijective ST is like the bijective BWT in that if you keep
    applying the transform enough times you come back to the starting file.
    This fact alone means that the BWT can not make all files compress better
    some it will help some it will hurt. But I suspect that the full order bijective
    BWT would generally be more useful for most files people construct but for
    certain special files the limited order bijective ST will be better.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,139
    Thanks
    321
    Thanked 1,402 Times in 804 Posts
    >> ST4 vs BWT is like PPM order4 vs PPM*.
    >
    > I do not fully agree; of course the order of BWT is
    > higher, but on the downside it scrambles the information
    > within the contexts,

    Well, the specific permutation of symbols within a context
    doesn't matter much for stationary sources (like texts).
    And higher orders usually have higher match rates.

    > So to me, the questions which one is better very heavily
    > relies on the backend you are using for compression.

    Unfortunately the dependence on data type is much stronger.
    And there's basically no "natural" data where the specific order of
    symbol occurences in a context really matters, so BWT is
    always not much worse than ST - even with nonstationary data.

    >> And inverse ST is not a problem, its basically the same as forward ST.
    >
    > Could you explain in more detail, please? An example?

    Here's an example of ST2:
    http://ctxmodel.net/files/ST2rc/st2rc_v4.rar

    Basically you just concatenate the symbols occuring in
    the context ctx into a string s[ctx] and then output these strings
    in a lexicographic context order - that's forward ST.

    And for the inverse ST, you first have to rebuild the context
    index, which is possible because ST1 fragment sizes (number
    of symbol occurences) can be calculated from transformed data,
    then digram occurences can be counted by associating a single
    context symbol with transformed data symbol, then we'd know
    how to associate digram contexts with transform symbols and
    thus be able to count trigrams etc, etc - the same as for BWT
    actually.

    And then, I didn't really think about how to implement
    a "proper" bijective ST with last bytes of data used
    as context for first bytes (cyclic string shifts).
    But its also possible to use some fixed bytes as a context
    for first symbols, however that involves some tricky workarounds
    to compensate for this "virtual" context in various numbers
    of occurences.

    > The overall construction for the usual ST seems to be more
    > complicated than necessary

    Forward ST is less complicated than BWT, as its the same
    radix sort which can be used for BWT, but with reduced
    number of steps.
    But inverse ST requires repeating whole sorting process,
    while with BWT we just assume that a i-th occurence of a symbol
    in BWT corresponds to i-th occurence of the same symbol in
    lexicographically sorted o1 contexts.

  14. #14
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Alibert View Post
    I just found this link on comp.compression:

    http://www.informatik.uni-stuttgart....s/kuf09psc.pdf

    It's a paper about the bijective sort transform. The presentation seems to be rather cryptic, but it contains a running example which I will go through -- maybe I will finally understand the (inverse of the) ST...

    @Shelwien: maybe it has the missing "bijectivity part" for your implementation.

    One general question: does the ST with a reasonable order (say 5 or give better or worse compression results (depending on the different encoders)? For example, what about the BWTmix approach on ST outputs?

    --Alibert
    The paper above made it to the conference in Prague.

    The paper I and Gil was working has been rejected.
    I really am not surprised that it was rejected. For
    some reason I was thinking about the myth of global
    warming the CO2 scare that the elite in my country
    are pushing. The Chinese are not as dumb as my
    government. I also remember the Czech President
    puttin down Al Gore. So with those thoughts in mind
    I wrote this
    To: ... the soda place
    Subject: Re: [SODA10] Rejected paper #107 "A Bijective String Sorting Transform"
    Date: Fri, 4 Sep 2009 10:33:39 -0700

    Show Full Headers Back To [SENT]

    That's ok the conference in the Czech Republic
    recognized the usefulness of a Bijective version of the BWT
    I really did not expect it to be recognized in this country
    where research is more a function of being politically correct
    than having real value. Just look at the phony science backing
    the myth of global warming. What can you expect from a country
    that thinks CO2 needs to be carefully controlled as to dangerous
    for the environment.

    Thank You
    David Scott
    Yes the paper needs polishing so what. I think Manfred changed
    his several times during the back and forth of getting it published.
    I am not sure what my plans are at this point or Gils plans.

    Bye the way are the Russians caught up in this phony CO2 cabon
    thing?

Similar Threads

  1. BWTS GENERAL COMPRESS in MinGW exe's
    By biject.bwts in forum Data Compression
    Replies: 18
    Last Post: 12th October 2010, 21:27
  2. USING BWTS in BWTmix
    By biject.bwts in forum Data Compression
    Replies: 8
    Last Post: 9th March 2010, 03:00
  3. BWTS explanation?
    By TopQuark in forum Data Compression
    Replies: 5
    Last Post: 8th April 2009, 23:26
  4. NEW FULL BWTS COMPRESSOR
    By biject.bwts in forum Data Compression
    Replies: 2
    Last Post: 20th January 2009, 21:10
  5. Original paper on LZMW
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 9th February 2008, 14:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •