Results 1 to 5 of 5

Thread: BIJECTIVE BWTS-DC-FIB

  1. #1
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Thumbs up BIJECTIVE BWTS-DC-FIB

    I finally got off my ass and put together a fully bijective from any file to any file
    BWTS DC thing.
    I would like to thank Yuta since we talked alot and fixed his DC to make it shorter
    for some cases. So that it would lead to this code. I have the original stuff Yuta
    sent me in the file. He really does fast clean work in changing DC.
    I would also like to thank Shelwien who help motivate me.
    ALso Matt for his encouragement.
    There are many other people that helped so far.
    The guy who uses Fibonacci on comp.compression I told him years ago I would
    write a bijective Fibonacii thing OK i did.

    Look it may have errors I tested in forward direction the calgary18 got 882,031
    which considering I have not added an arb255 style arithmetic coder yet I
    think is pretty good. I still got the file "BANANAS" to compress smaller.

    I tested the reverse on short files. The problem is that I wanted to use 0 as
    a table exit code. This meant '1111' in the input close to front would cause
    symbol table to end when testing even short files. So I used the 256th
    symbol bijectively as the stop in table. The same problem occurs when using
    0 to terminate the n-2 character runs it occurs randomly to often so used 21.

    Please test it on short files when testing for full bijectiveity.
    by short I mean 100 bytes or less. It can blow up to a gig even
    in a dozen bytes. I don't count that as an error. Look I can go
    a lot farther but this is a start. Please look at it.
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    May 2009
    Location
    CA USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Offtopic a bit, but I always wanted to ask.

    Since the bijective BWT transform is 1:1 with no expansion or starting pointer, are there any interesting properties by applying the bij-BWT twice? Or un-bij-BWTing a text file? Or applying bijBWT a very large number of times. (In theory if you apply bij-BWT enough times sequentially you should get your original file back! Of course this may take a googolplex of iterations..)

    None of those crazy applications would be useful for searching or compression or anything, but it'd be curious if there were some other pattern or structure revealed by such abuse.

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Arrow

    Quote Originally Posted by GerryB View Post
    Offtopic a bit, but I always wanted to ask.

    Since the bijective BWT transform is 1:1 with no expansion or starting pointer, are there any interesting properties by applying the bij-BWT twice? Or un-bij-BWTing a text file? Or applying bijBWT a very large number of times. (In theory if you apply bij-BWT enough times sequentially you should get your original file back! Of course this may take a googolplex of iterations..)

    None of those crazy applications would be useful for searching or compression or anything, but it'd be curious if there were some other pattern or structure revealed by such abuse.
    When I think about bijections there are several types. One is a bijection from a closed finite set to another closed finite set. Such a bijection in this class is the standard BWTS it works on a starting file and then ouputs a permutation of the starting set. If you do this standard BWTS on any file then if you do it enough times you always get the original file back.

    Another kind of bijection to the set of all files from all files is one where there is an inifinte number of items. One with a little care could reduce that to some finite set but I generally leave it open ended.
    Let me give an example of this second type. Think of the set of all binary strings of one and zeros. In this case the bit one or zero could be thought of as a bit. This set can be bijective mapped to the set of 8 bit byte files, I do that all the time but not here. Look at the files in an ordered way
    0
    1
    00
    01
    10
    11
    000
    0001

    and so on forever
    I could map this set to the set of number starting with 2 by putting a 1
    in front and reading the resultant number example.
    0 becomes 10 which is 2
    1 becomes 11 which is 3
    00 becomes 100 which is 4
    01 becomes 101 which is 5

    notice this is a familar pattern where every file is mapped bijectively to set of numbers
    starting with 2.

    I can define a bijection. where if the file is represented by an odd number its mapped to
    the next lower file that uses the next lower odd number except if the file is represented by
    3 it maps to 2. And define the mapping of files represented by even numbers as being
    replaced by the file that has next larger even number. You could make this a loop
    by fixing the max length of string. In this case when there is no file represented by
    the next even number map that file to the one with the highest odd number. But
    I am allowing all files so not a loop.

    This bijection is not a loop. When you allow the set of all files you can have loops you
    can have things that get smaller and then get bigger. You can even define thing that
    may have what likes like a dampened ocsellation (spell check failed me here) that gets
    smaller then larger. You can have subsets in the files that form loops and so on.

    When you repeatedly compress much can happen but in general repeated even
    one time for most real life files causes expansion.

    My main hobby is compression and encryption. I almost had a chance to go to Israel
    and talk to some of the big boys. Gil was going to introduce me. It had free tickets to
    fly to Prague and then to Israel but my wife worried the trip would kill me so I did not
    go. I actually think she thought they would mistake my for a terrorist since I act so
    differently from a normal person and I know only english. Twice in my life I was almost
    killed by police. Once in Japan going though a checkpoint long story. And once in
    Ridgecrest California where a woman cop stopped me for no reason and I asked an
    innocent question about the glock she was carrying. She immediately pulled it out
    and was shaking. I was not sure if I should rush her or disarm her. I mentally decided
    that if she shot I would do my best to see her dead along with me. She really had
    no business being a cop. I froze and she called for back up. Fortunately I knew many
    of the cops that showed up we played poker together. They took her aside still
    shaking and told me she just over reacted and that she was transfered in from anther
    place and never had met people who act like me so I am different.

    Anyway I had discussions years ago about bijective compression as a first
    pass before encryption. There was a German guy very interested in it.
    Anyway to make a long story short. The BWTS-DC-FIB in its present form
    I would not use as the stage immediately before something
    like AES. First of all the code is not complete there may some
    errors. Second of all though the classic Unitcity Distance may
    be larger. There are weakness that one could distinguish most
    compressed files from a true random file. But these things can
    be improved upon.

  4. #4
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Talking

    If you want to test this code please when doing compression try to do
    either the 3 things in order or just test the middle one dctemp4.

    Note dctemp4 e inputfile outputfile
    input file is any file output file is a mulitply of 8bytes where each 8 bytes represents a postitve
    number from zero to what ever. The code is bijective from bytes to the 8byte grouping.

    However realistically if you have a random file and do dectemp4 d inputfile outputfile
    it can easily blowup in just 8 bytes. I want to test this thing in the d direction for small
    numbers just to see if the concept is correct. A later version will make this less likely
    I just want to see if the bijective DC concept is correct.

    For the reverse direction if you get dctemp4 d to fail where the sum of each of the 64bit
    numbers totals less than a million then you have found an error. In either the DC itself
    or my interface to it. Like wise if the sum is less than a million and your
    do dctemp4 d filea fileb followed by dectemp4 fileb filec and
    filea not equal to filec then there is either an error in DC itseld not being bijective
    or in my connecting to it. Once again even for this filea is made of 64 bit numbers
    each of which is 0 to a large number the total value is less than a million and the
    number of numbers is less than a thousand. I expect that I still have a few errors
    I hope if that is true that one of you find the error and send me the file causing
    the error and I will try to fix it before I do more with this code. Note
    I have not yet at this stage added an arithmetic compressor it will compress
    better when I do that. However even now it wil compress enwik9
    t0 185,709,858 bytes. Using a machine with 6GB memory and the 64
    bit exes.

    Thank You
    David A. Scott

  5. #5
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    here is the replacement for dctemp5.exe

    I will post a cleaner one soon
    it is just for testing.

    it was complied with no options



    X:\>fc /L /N /W dctemp6.c dctemp5.c
    Comparing files dctemp6.c and DCTEMP5.C
    ***** dctemp6.c
    59: DCV[m] = (n - 1) - (j + 1);
    61: if( m == 1) DCV[0]--;
    63: return m;
    ***** DCTEMP5.C
    59: DCV[m] = (n - 1) - (j + 1);
    61: return m;
    *****

    ***** dctemp6.c
    102: }
    103: static int ds1 = 0;
    104: static int bread ( int *c , long long *t, FILE *fp )
    ***** DCTEMP5.C
    100: }
    101: static int ds1 = 3;
    102: static int bread ( int *c , long long *t, FILE *fp )
    *****

    ***** dctemp6.c
    117: *t = bn / (k + eonsflg);
    118: ds1 = (x == EOF) && ( 0 == ( (bn +1) % (k + eonsflg)));
    120: *c = adjidr(BR, bn % (k-- + eonsflg ));
    ***** DCTEMP5.C
    115: *t = bn / (k + eonsflg);
    116: ds1 = (bn +1) % (k + eonsflg);
    118: *c = adjidr(BR, bn % (k-- + eonsflg ));
    *****

    ***** dctemp6.c
    170: if ( m == 0 ) return;
    171: // if(m == 1 ) DCV[0] -= 1;
    172: if ( 256 != a ) {
    173: fprintf(stderr," eons DVC[0] = %u ",DCV[0]);
    174: c = adjid(B, 256);
    175: bwrite( c ,(long long) DCV[0], fp ); /* write eons with next DCV val
    ue */
    176: fprintf(stderr," rest of DCV follows %u \n ",m-1);
    ***** DCTEMP5.C
    168: if ( m == 0 ) return;
    169: if ( 256 != a ) {
    170: fprintf(stderr," eons DVC[0] = %u ",DCV[0]-1);
    171: c = adjid(B, 256);
    172: bwrite( c ,(long long) DCV[0] - 1, fp ); /* write eons with next DCV
    value */
    173: fprintf(stderr," rest of DCV follows %u \n ",m-1);
    *****

    ***** dctemp6.c
    177: } else {
    178: fprintf(stderr," NO eons all 256 symbols used DVC[0] = %u \n",DCV[0]
    );
    179: bwrite(-2,(long long)DCV[0],fp);
    180: fprintf(stderr," no eons DVC[0] = %u ",DCV[0]);
    ***** DCTEMP5.C
    174: } else {
    175: fprintf(stderr," NO eons all 256 symbols used DVC[0] = %u \n",DCV[0]
    -1);
    176: bwrite(-2,(long long)DCV[0] - 1,fp);
    177: fprintf(stderr," no eons DVC[0] = %u ",DCV[0]);
    *****

    ***** dctemp6.c
    237: /* specail case no eonst ends early */
    238: if (ds1) t++;
    239: // t++;
    ***** DCTEMP5.C
    234: /* specail case no eonst ends early */
    235: if (ds1 == 0) t++;
    236: // t++;
    *****
    Attached Files Attached Files

Similar Threads

  1. BWTS GENERAL COMPRESS in MinGW exe's
    By biject.bwts in forum Data Compression
    Replies: 18
    Last Post: 12th October 2010, 21:27
  2. USING BWTS in BWTmix
    By biject.bwts in forum Data Compression
    Replies: 8
    Last Post: 9th March 2010, 03:00
  3. BWTS STATUS OF PAPER
    By biject.bwts in forum Data Compression
    Replies: 20
    Last Post: 4th September 2009, 22:10
  4. BWTS explanation?
    By TopQuark in forum Data Compression
    Replies: 5
    Last Post: 8th April 2009, 23:26
  5. NEW FULL BWTS COMPRESSOR
    By biject.bwts in forum Data Compression
    Replies: 2
    Last Post: 20th January 2009, 21:10

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •