Results 1 to 7 of 7

Thread: Ph.D Research topic

  1. #1
    Member bello's Avatar
    Join Date
    Dec 2014
    Location
    Nigeria
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Ph.D Research topic

    Hi,

    Presently enrolled in Ph.D computer science, trying to design and develop an efficient framework for data compression on the basis BWT technique.
    Please can someone comment on the suitability of the topic as a Ph.D work.
    Thanking you in anticipation of your response.

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    What are you doing that's new and original? A dissertation is equivalent to about 5 research papers.

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by bello View Post
    Hi,

    Presently enrolled in Ph.D computer science, trying to design and develop an efficient framework for data compression on the basis BWT technique.
    Please can someone comment on the suitability of the topic as a Ph.D work.
    Thanking you in anticipation of your response.
    Why not be more creative. Instead of using old BWT try to use BWTS instead?

  4. #4
    Member bello's Avatar
    Join Date
    Dec 2014
    Location
    Nigeria
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for the suggestions, i will look into the BWTS

  5. #5
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    BWTS is REALLY nice if it's about encrypting data because of the block size you can knock out processors with a high core count trying to crack an encryption in parallel. BWTS will cause a bottleneck with the memory accesses.

    But when it's about data compression Mr. Scott (user: biject.bwts, autor of BWTS) has probably the best overview in which area progress is probable. The current state is, that BWTS is not widely accepted as a good alternative to a regular BWT for compression purposes. I think there is still work to do regarding the speed. At least from what I know BWTS is quite a bit slower than a regular BWT. Maybe there are also chances to reduce the memory consumption below the memory consumption of a regular BWT. If I understood the algorithm correctly then the lyndon-words increase the element size which is normaly 1 byte per element for a regular BWT. My experience is also, that the existing documentation is scarce. I am not so up-to-date regarding BWTS.
    Last edited by just a worm; 12th July 2015 at 23:29.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I wrote a brief description of BTWS at http://mattmahoney.net/dc/dce.html#Section_559 (and lots more on BWT and other compression algorithms).

    The main difference between BWTS and regular BWT is that BWTS is bijective, which means every possible string is a valid encoding. There is no pointer to the first element. This is an interesting property, but it only improves compression by 4 bytes per block (for blocks up to 4 GB). Splitting the BWT into Lyndon words (cycles with an implied first element) also can have some effect on compression, but experimentally it appears to be negligible.

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I wrote a brief description of BTWS at http://mattmahoney.net/dc/dce.html#Section_559 (and lots more on BWT and other compression algorithms).

    The main difference between BWTS and regular BWT is that BWTS is bijective, which means every possible string is a valid encoding. There is no pointer to the first element. This is an interesting property, but it only improves compression by 4 bytes per block (for blocks up to 4 GB). Splitting the BWT into Lyndon words (cycles with an implied first element) also can have some effect on compression, but experimentally it appears to be negligible.
    I think that I caused part of the confusion as to how well BWTS works. I got to review a paper one time of IEEE and the paper had tables of test results. I was never allowed to actually run any real tests. So when I tested BWTS versus BWT I took a paper listed below where the guy does a BWT on "one" "zero" bits of some common files. I was never able to get the code used. But I falsely assumed that since you have a stream only made of runs of ones and zeros. That it was easy. He described his method and it seemed complicated so I only did a simple minded method and used my BWTS on the same files followed by my simple compressor of the one zero runs. If I had his code I would have used his. The results looked very good. Other than those file I used really short files like "BANANAS" which will easily compress below 7 BYTES. So I thought it was great.

    However since then when I wrote test programs I notices that if I used test files. You could somewhat tune the following stages differently depending on which version of BWT ( and there is more than one way so you actually get different results.) or if your using BWTS when you are doing the tuning. I still believe that even thought the gains as stated by Matt tend to be small I would not say they are negligible. But for many files you gain a few bytes for others you lose a few bytes. It depends on that the word "negligible" means. But in general the paper I worked on is overly optimistic about the gains.

    I an not good at understanding what other people write. But the BWTS is not the result of just splitting a file up into Lyndon Words and then doing a BWT on each segment and joining them together string by string. That is a misconception. When you split the file up into smaller parts. Yes you sort them all together as you do with a regular BWT type of transform if you get several Lyndon Words the resultant final file is the result of the Lyndon Words broken apart and scattered through out the final file.

    I think this is paper that we used for test
    http://www.mast.queensu.ca/~web/Papers/nagy-phd06.pdf

    I think this was paper of Gil and me. Not the final form it had errors
    but the concept was not accepted
    http://bijective.dogma.net/00yyy.pdf

    The last paper I think is a clean updated paper that was in Czech string
    conference
    http://arxiv.org/abs/0908.0239

    Yuta who wrote versions of BWTS on this site has fastest second versions of BWTS
    there was 2 others that wrote fast versions on this site but my memory is going fast
    can't find them now but this are here.

    And last yes Mark Nelson gave me the push needed to be the first to code this. And yes
    it was because Americans not free anymore to code what the hell they want and it in my
    view along with several modifications its useful in encryption.

Similar Threads

  1. Is UA war topic removed?
    By Sportman in forum The Off-Topic Lounge
    Replies: 16
    Last Post: 19th June 2019, 15:52
  2. Replies: 2
    Last Post: 28th December 2017, 11:35
  3. Replies: 2
    Last Post: 4th August 2011, 17:25
  4. Replies: 6
    Last Post: 19th August 2010, 23:59
  5. A name for the off topic part of the forum?
    By Fallon in forum The Off-Topic Lounge
    Replies: 3
    Last Post: 26th May 2009, 17:05

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •