Page 2 of 2 FirstFirst 12
Results 31 to 35 of 35

Thread: SCOTT TRANSFORM

  1. #31
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts

    Unhappy

    Quote Originally Posted by biject.bwts View Post
    The fact is I haven't played with Yuta's BWT code. However I did spend a lot of time looking at his code in BWTS.I which I suspect may be similar to what he did for BWT but I am not sure. Here is what he said to me one time check out G. Nong, S. Zhang and W. H. Chan, "Linear Suffix Array Construction by Almost Pure Induced-Sorting". I did find something on the net about that but like most papers I am not sure if it had much useful info for me. To me Yuta's code itself is the key. I have thought about changing it so one could get a true BWT for output instead of the Suffix sorted array which to me is not pure and seems to be what people are calling BWT nowadays.

    Actually you just break the array to lyndon words and rotate the whole array in question till its one big lyndon word or single lyndon word repeated several time. The index is the number of characters of the array that you rotated and the BWT is the BWTS of the rotated array. So it would not be hard to do if any one actually cared about using the true BWT for the data. But just as it seems it not worth the time to calculate the BWTS it just as likely its not worth the time to calculate the real BWT when people seem happy with using the sorted suffix array thing.

    I think most do not like the way I explain things and it would be foolish for me to explain in words without a black board or something. Since what I would say would be not exactly correct. Maybe its my lack of vocabulary or lack of words. I liked old C and machine code. I like Yuta's style he does not waste space with long names he just gets down and codes thing the way they should. You can can add plenty of writes to see what the values are. But is code is not wasteful and much easier to follow then C++ code.
    The algorithm described "Linear Suffix Array Construction by Almost Pure Induced-Sorting" is patented in China (http://zhuanli.baidu.com/pages/sipo/...8b1d1a7_0.html) by Nong Ge, the author. So there may be some legal problem in Yuta's code.
    Anyway I think it is the patent is sometimes foolish, but if we can use an other suffix sorting algorithm, why not use it instead of the patented one?

  2. #32
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,494
    Thanks
    26
    Thanked 131 Times in 101 Posts
    David:
    Thanks for the reference. I've found Nong's website.

    Gribok:
    I know that divsufsort uses ITS (improved two-stage) to reduce the numbers of suffices to be explicitly sorted, but it seems that this gives somewhat irregular gaps between induced positions, ie it would be hard to determine original position from transformed position. Is the ISA only used for dealing with tandem repeats?

    Rich:
    What about other algorithms from Nang? Or other countries (ie. is his patent valid outside of China)? It could be possible to compile two versions of program, one for China and one for the rest of the world.

    By the way, Sami Runsas has developed his own SACA, competitive with divsufsort. I'm not sure if it operates in 4N or 5N space (his description is ambiguous for me) or if it's patented (no info, but it's closed source). Maybe someone can ask him about that. Info is here: http://nanozip.net/research.html
    Last edited by Piotr Tarsa; 12th August 2011 at 10:09.

  3. #33
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Rich:
    What about other algorithms from Nang? Or other countries (ie. is his patent valid outside of China)? It could be possible to compile two versions of program, one for China and one for the rest of the world.

    By the way, Sami Runsas has developed his own SACA, competitive with divsufsort. I'm not sure if it operates in 4N or 5N space (his description is ambiguous for me) or if it's patented (no info, but it's closed source). Maybe someone can ask him about that. Info is here: http://nanozip.net/research.html
    I don't know much about the patent but Yuta's code (openbwt-1.5) does use Nong's patented algorithm. There several other linear suffix sorting algorithms, but I am not sure if they can be modified to perform on a list of lyndon words?
    There is my completed O(nlogn) implement: http://pastebin.mozilla.org/1297719

  4. #34
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by RichSelian View Post
    I don't know much about the patent but Yuta's code (openbwt-1.5) does use Nong's patented algorithm. There several other linear suffix sorting algorithms, but I am not sure if they can be modified to perform on a list of lyndon words?
    There is my completed O(nlogn) implement: http://pastebin.mozilla.org/1297719
    I am not so sure that Yuta's code is hurt by the patients. I would like to hear that from Yuta himself. I think he must be close to these people since they reference him in some of their papers. I think he improved on there ideas so maybe he could patent the improvement. Also I don't think the original authors are even aware of BWTS or even if they think a bijective pure BWT is possible. I didn't know China even had patients or what it means to the rest of the world. Yuta in many of his coding states thing like this from his BWTS.i file

    /*
    * bwts.i
    * Copyright (c) 2008-2010 Yuta Mori. All Rights Reserved.
    *
    * Permission is hereby granted, free of charge, to any person
    * obtaining a copy of this software and associated documentation
    * files (the "Software"), to deal in the Software without
    * restriction, including without limitation the rights to use,
    * copy, modify, merge, publish, distribute, sublicense, and/or sell
    * copies of the Software, and to permit persons to whom the
    * Software is furnished to do so, subject to the following
    * conditions:
    *
    * The above copyright notice and this permission notice shall be
    * included in all copies or substantial portions of the Software.
    *
    * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
    * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
    * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
    * OTHER DEALINGS IN THE SOFTWARE.
    */

    I don't wrote commercial compression code. I am only interested in the theory of compression. So every thing I write is in the interested of theory. Yuta freqently
    puts this in some of this code.


    OpenBWT - v1.4

    == Introduction ==
    OpenBWT is a small open source library for the Burrows-Wheeler transformation.
    It currently provides a linear-time BWT/BWTS, inverse BWT/BWTS and several
    second stage transform algorithms for BWT-based compression.

    == License ==
    OpenBWT is licensed under the MIT license. See the file License.txt for details.

    When OpenBWT first came out one guy on this site who is from MIT was bothered that it contained stuff copyrighted stuff that was original from Yuta. The the poster of OpenBWT stated he was Yuta. So at least it seems like its for real. But who knows. One thing for sure is Yuta first one to code a linear time and space BWTS. It may cause other experts to take a look.
    algorithm

  5. #35
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    I don't know if the Chinese patent describes divsufsort, but the application date is Jan. 27, 2011, which is well after divsufsort was published.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Schindler Transform (STX)
    By CUDALIKE in forum Data Compression
    Replies: 15
    Last Post: 28th November 2011, 23:40
  2. a very simple transform for english.dic
    By willvarfar in forum Data Compression
    Replies: 8
    Last Post: 1st March 2010, 14:44

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •