Results 1 to 2 of 2

Thread: Writing a bijective DC and general bijective approach

  1. #1
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Thumbs up Writing a bijective DC and general bijective approach

    I wrote a bijective DC a long time ago. But most of what I wrote is gone. Here is a more pure approach to writing bijective code.

    I have covered many times of how to go from a Huffman like series to a fully bijective file here is a summary with an example
    A 00
    B 01
    C 10
    D 11

    You basically write the bits for each token and add ZEROS as needed. If the set of tokens ends with an ALL ZERO TOKEN or
    an ALL ZERO TOKEN Followed by one or more ONE TOKENS you add another 1 bits and then several zero bits.

    From this you start taken 8 bits at a time and write byte out. Ignore the trailing extra zero bits where it creates a tail of ZERO BYTES.
    Know if it ends in TAIL of an all ZERO BYTE followed by a one or more bytes of 100000000 drop the last byte and your done

    example
    00 10 01 11 goes to a single byte 00100111
    10 01 11 00 goes to a two bytes 10011100 10000000
    00 00 00 00 goes to a singe byte 00000000
    00 00 00 00 10 00 00 00 goes to two bytes 00000000 10000000
    00 00 00 00 10 goes to two bytes 00000000 10100000

    now if you have Huffman like tables but have code where the ZERO token is never last. You lay out all the bits as above and
    never need a the ONE TOKENS ADD to the beginning set however you still need to take it in to account when you write the
    byte data out.

    examples
    00 10 01 11 same as above to 00100111
    10 01 11 goes to 10011100 note on input no ZERO TOKEN allowed for ending.
    Not run of all ZEROS that would be a Null File or Null string on output
    00 00 00 00 10 a single byte out 00000000

    Back to DC the famous string on the internet aaabccca the answer given by some is 30
    I say its just plain 3 no 0

    The above example does not cover the start of DC is assumes you already have the start of first
    a b c it also does not cover what happen when the last run is longer than one character.

    Here is a way to do it all including the starting and multiple character runs at end. I will add
    a ":" when virtual part done. The mapping to list of number is such that you never have a zero
    at end and that you never have more than M-2 zeroes in the output.

    looking at aaabccca lets assume that only the characters abcd allowed.
    write abcd:aaabccca

    as you sweep through modified file. you start at the first "a" go to the end of that run count to you see
    next a if no more a write a temporary zero if number of zeros witter so far less than M-2
    now take the count not zero and subtract the number of types of character seen so far other than a
    in this case 4 - 3 so you write out a "1"
    then do the same for b in this case its a "2"
    then do the same for the c in this case a "2"
    then do the same for the d in this case a "0" allowing a max of one more
    know I am going to write a colon to show the allowed taken care of ":"
    now at first part of string go to end of run count to a 5 two symbol types so subtract 2 write a "3"
    now on the next b you have no b so prepare a "0" which is last zero if needed
    now at the c go to end of run and and no count to an c so next chech how many character is
    in last run. There is one but it was before the last zero so no output for the c your done

    1220:3 is the whole output

    lets change the string by addind another a so we have aaabcccaa

    now when will geto to the b we write the "0" since we need to write a "1" for the last c since we need an extra a

    1220:301

    next add two extra cc and end so you have aaabcccaacc
    now when you get to the c after the only b and the fact all ZEROS written you the number of a's write a "2" note
    that it was one before since those a's were pointed to before last zero written and now not end.
    know since c pointed to after last zero written need true count "2"

    so you get 1220:3022

    suppose only abc allowed going back to first case
    cba:aaabccca you get
    c goes to "5"
    b goes to "4"
    a need to start at end of run even though in virtual part its a "3"

    so you get 543:

    Now the tricky part if you have not used all the zero tokens

    the bit patterns for N are
    0 0
    1 100
    2 101
    3 11000
    4 11010
    5 11001
    6 11011

    once or if M-2 zeroes appear no need for zero

    1 1
    2 010
    3 011
    4 00100
    5 00110
    6 00101
    7 00111

    the key is pick a pattern so zero is only zeros until m-2 done
    then pick a universal for after when zero not need make sure not zero token exists after m-2 zeroes

    then convert to the byte file example
    abcd:aaabcccaacc

    1220:3022
    100 101 101 0 : 011 0 010 010 not after second zero change pattern for numbers
    10010110 10011001 00100000 3 bytes

    of course you a a special bijective arithmetic coder for last pass,


    " if errors will reread and change text by editing I am to tired now"

  2. #2
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I wrote the above because I wanted an easier to do DC. It really needs to be followed by a bijective arithmetic coder to get better compression.

    One You can choose almost any universal encoder for the two tables one you use before the M-2 zeroes occur and one after. But many may not
    like the fact if you have an extremely large possible character set as you go through the table in the beginning you write out a lot of zeros. If this
    is a bother you can group together zeros the beginning like this

    1 1 if the number stats with a 1
    01 010 meaning for above table 0 not used symbol and 1 part of jump for next symbol
    001 011
    0001 00100
    00001 00110
    000001 00101
    0000001 00111 and so in notice each one there is no all zero string which at end ignored.

    notice since only M-2 zero symbols if 000...001 and end of the pool of possible symbols the extra zeros would be part of next number

    example suppose ...010 end of number 0 0 0 : new table 0001000... you replace 0 0 0 0001 with 00111

Similar Threads

  1. BIJECTIVE DC
    By biject.bwts in forum Data Compression
    Replies: 2
    Last Post: 11th May 2015, 23:43
  2. Writing GUI in PowerShell
    By Bulat Ziganshin in forum The Off-Topic Lounge
    Replies: 4
    Last Post: 11th August 2013, 21:04
  3. BIJECTIVE BWTS-DC-FIB
    By biject.bwts in forum Data Compression
    Replies: 4
    Last Post: 13th February 2012, 01:36
  4. bijective fpaq0p_sh
    By biject.bwts in forum Data Compression
    Replies: 20
    Last Post: 16th November 2011, 03:48
  5. Replies: 0
    Last Post: 30th August 2011, 16:47

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •