# Thread: Huffman tree, Autumn leaves and Winter cut-back

1. ## Huffman tree, Autumn leaves and Winter cut-back

This is a sample of the ALWC (Autumn Leaves Winter Cut-back) algorithm I came up last night, it's a kind of false dynamic Huffman, not sure its really helpful since both the encoder and the decoder need the real number of occurrences of each letter beforehand which leads to larger headers for limited gains.

Basically it starts by building a standard Huffman prefix-free code, after each symbol encoding/decoding it updates its occurrence count and keeps the symbol list sorted until the symbol count reaches zero — the Autumn Leave — resulting in a small prefix-free code change (two of the longest codes are fused together to form a one bit shorter code) — the Winter Cut-back —.

Sample used "Les beaux kiwis que voici !"
Symbol count between square brackets.

Code:
```[5]  _   00
[4]  i   010
[3]  e   011
[2]  s   1000
[2]  u   1001
[1]  x   1010
[1]  !   10110
[1]  L   10111
[1]  a   11000
[1]  b   11001
[1]  c   11010
[1]  k   11011
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

L (10111)

[5]  _   00        [5]  _   00
[4]  i   010       [4]  i   010
[3]  e   011       [3]  e   011
[2]  s   1000      [2]  s   1000
[2]  u   1001      [2]  u   1001
[1]  x   1010      [1]  x   1010
[1]  !   10110     [1]  !   1011
[1]  a   10111     [1]  a   11000
[1]  b   11000     [1]  b   11001
[1]  c   11001     [1]  c   11010
[1]  k   11010     [1]  k   11011
[1]  o   11011     [1]  o   11100
[1]  q   11100     [1]  q   11101
[1]  v   11101     [1]  v   11110
[1]  w   11110     [1]  w   11111
11111

e (011)

[5]  _   00
[4]  i   010
[2]  e   011
[2]  s   1000
[2]  u   1001
[1]  x   1010
[1]  !   1011
[1]  a   11000
[1]  b   11001
[1]  c   11010
[1]  k   11011
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

s (1000)

[5]  _   00
[4]  i   010
[2]  e   011
[2]  u   1000
[1]  s   1001
[1]  x   1010
[1]  !   1011
[1]  a   11000
[1]  b   11001
[1]  c   11010
[1]  k   11011
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

_ (00)

[4]  _   00
[4]  i   010
[2]  e   011
[2]  u   1000
[1]  s   1001
[1]  x   1010
[1]  !   1011
[1]  a   11000
[1]  b   11001
[1]  c   11010
[1]  k   11011
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

b (11001)

[4]  _   00        [4]  _   00
[4]  i   010       [4]  i   010
[2]  e   011       [2]  e   011
[2]  u   1000      [2]  u   1000
[1]  s   1001      [1]  s   1001
[1]  x   1010      [1]  x   1010
[1]  !   1011      [1]  !   1011
[1]  a   11000     [1]  a   1100
[1]  c   11001     [1]  c   11010
[1]  k   11010     [1]  k   11011
[1]  o   11011     [1]  o   11100
[1]  q   11100     [1]  q   11101
[1]  v   11101     [1]  v   11110
[1]  w   11110     [1]  w   11111
11111

e (011)

[4]  _   00
[4]  i   010
[2]  u   011
[1]  e   1000
[1]  s   1001
[1]  x   1010
[1]  !   1011
[1]  a   1100
[1]  c   11010
[1]  k   11011
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

a (1100)

[4]  _   00        [4]  _   00
[4]  i   010       [4]  i   010
[2]  u   011       [2]  u   011
[1]  e   1000      [1]  e   1000
[1]  s   1001      [1]  s   1001
[1]  x   1010      [1]  x   1010
[1]  !   1011      [1]  !   1011
[1]  c   1100      [1]  c   1100
[1]  k   11010     [1]  k   1101
[1]  o   11011     [1]  o   11100
[1]  q   11100     [1]  q   11101
[1]  v   11101     [1]  v   11110
[1]  w   11110     [1]  w   11111
11111

u (011)

[4]  _   00
[4]  i   010
[1]  u   011
[1]  e   1000
[1]  s   1001
[1]  x   1010
[1]  !   1011
[1]  c   1100
[1]  k   1101
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

x (1010)

[4]  _   00        [4]  _   00
[4]  i   010       [4]  i   010
[1]  u   011       [1]  u   011
[1]  e   1000      [1]  e   1000
[1]  s   1001      [1]  s   1001
[1]  !   1010      [1]  !   1010
[1]  c   1011      [1]  c   1011
[1]  k   1100      [1]  k   1100
[1]  o   1101      [1]  o   1101
[1]  q   11100     [1]  q   1110
[1]  v   11101     [1]  v   11110
[1]  w   11110     [1]  w   11111
11111

_ (00)

[4]  i   00
[3]  _   010
[1]  u   011
[1]  e   1000
[1]  s   1001
[1]  !   1010
[1]  c   1011
[1]  k   1100
[1]  o   1101
[1]  q   1110
[1]  v   11110
[1]  w   11111

k (1100)

[4]  i   00        [4]  i   00
[3]  _   010       [3]  _   010
[1]  u   011       [1]  u   011
[1]  e   1000      [1]  e   1000
[1]  s   1001      [1]  s   1001
[1]  !   1010      [1]  !   1010
[1]  c   1011      [1]  c   1011
[1]  o   1100      [1]  o   1100
[1]  q   1101      [1]  q   1101
[1]  v   1110      [1]  v   1110
[1]  w   11110     [1]  w   1111
11111

i (00)

[3]  i   00
[3]  _   010
[1]  u   011
[1]  e   1000
[1]  s   1001
[1]  !   1010
[1]  c   1011
[1]  o   1100
[1]  q   1101
[1]  v   1110
[1]  w   1111

w (1111)

[3]  i   00        [3]  i   00
[3]  _   010       [3]  _   010
[1]  u   011       [1]  u   011
[1]  e   1000      [1]  e   100
[1]  s   1001      [1]  s   1010
[1]  !   1010      [1]  !   1011
[1]  c   1011      [1]  c   1100
[1]  o   1100      [1]  o   1101
[1]  q   1101      [1]  q   1110
[1]  v   1110      [1]  v   1111
1111

i (00)

[3]  _   00
[2]  i   010
[1]  u   011
[1]  e   100
[1]  s   1010
[1]  !   1011
[1]  c   1100
[1]  o   1101
[1]  q   1110
[1]  v   1111

s (1010)

[3]  _   00        [3]  _   00
[2]  i   010       [2]  i   010
[1]  u   011       [1]  u   011
[1]  e   100       [1]  e   100
[1]  !   1010      [1]  !   101
[1]  c   1011      [1]  c   1100
[1]  o   1100      [1]  o   1101
[1]  q   1101      [1]  q   1110
[1]  v   1110      [1]  v   1111
1111

_ (00)

[2]  _   00
[2]  i   010
[1]  u   011
[1]  e   100
[1]  !   101
[1]  c   1100
[1]  o   1101
[1]  q   1110
[1]  v   1111

q (1110)

[2]  _   00        [2]  _   00
[2]  i   010       [2]  i   010
[1]  u   011       [1]  u   011
[1]  e   100       [1]  e   100
[1]  !   101       [1]  !   101
[1]  c   1100      [1]  c   110
[1]  o   1101      [1]  o   1110
[1]  v   1110      [1]  v   1111
1111

u (011)

[2]  _   00        [2]  _   00
[2]  i   010       [2]  i   010
[1]  e   011       [1]  e   011
[1]  !   100       [1]  !   100
[1]  c   101       [1]  c   101
[1]  o   110       [1]  o   110
[1]  v   1110      [1]  v   111
1111

e (011)

[2]  _   00        [2]  _   00
[2]  i   010       [2]  i   01
[1]  !   011       [1]  !   100
[1]  c   100       [1]  c   101
[1]  o   101       [1]  o   110
[1]  v   110       [1]  v   111
111

_ (00)

[2]  i   00
[1]  _   01
[1]  !   100
[1]  c   101
[1]  o   110
[1]  v   111

v (111)

[2]  i   00        [2]  i   00
[1]  _   01        [1]  _   01
[1]  !   100       [1]  !   10
[1]  c   101       [1]  c   110
[1]  o   110       [1]  o   111
111

o (111)

[2]  i   00        [2]  i   00
[1]  _   01        [1]  _   01
[1]  !   10        [1]  !   10
[1]  c   110       [1]  c   11
111

i (00)

[1]  i   00
[1]  _   01
[1]  !   10
[1]  c   11

c (11)

[1]  i   00        [1]  i   0
[1]  _   01        [1]  _   10
[1]  !   10        [1]  !   11
11

i (0)

[1]  _   0         [1]  _   0
[1]  !   10        [1]  !   1
11

_ (0)

[1]  !   1
0

! ()```

2. There are a lot of old papers that compare static arithmetic coder where one has all the counts in a header and than used the reduced stats for each symbol when it occurs. The paper shows that its basically the same as a simple adaptive arithmetic compressor where each symbol starts with weight of one and you increment up.

So its unlikely this would be any better on average than a simple bijective adaptive huffman compressor.

3. Originally Posted by caveman
This is a sample of the ALWC (Autumn Leaves Winter Cut-back) algorithm I came up last night, it's a kind of false dynamic Huffman, not sure its really helpful since both the encoder and the decoder need the real number of occurrences of each letter beforehand which leads to larger headers for limited gains.

Basically it starts by building a standard Huffman prefix-free code, after each symbol encoding/decoding it updates its occurrence count and keeps the symbol list sorted until the symbol count reaches zero — the Autumn Leave — resulting in a small prefix-free code change (two of the longest codes are fused together to form a one bit shorter code) — the Winter Cut-back —.

Sample used "Les beaux kiwis que voici !"
Symbol count between square brackets.

Code:
```[5]  _   00
[4]  i   010
[3]  e   011
[2]  s   1000
[2]  u   1001
[1]  x   1010
[1]  !   10110
[1]  L   10111
[1]  a   11000
[1]  b   11001
[1]  c   11010
[1]  k   11011
[1]  o   11100
[1]  q   11101
[1]  v   11110
[1]  w   11111

L (10111)

[5]  _   00        [5]  _   00
[4]  i   010       [4]  i   010
[3]  e   011       [3]  e   011
[2]  s   1000      [2]  s   1000
[2]  u   1001      [2]  u   1001
[1]  x   1010      [1]  x   1010
[1]  !   10110     [1]  !   1011
[1]  a   10111     [1]  a   11000
[1]  b   11000     [1]  b   11001
[1]  c   11001     [1]  c   11010
[1]  k   11010     [1]  k   11011
[1]  o   11011     [1]  o   11100
[1]  q   11100     [1]  q   11101
[1]  v   11101     [1]  v   11110
[1]  w   11110     [1]  w   11111
11111

e (011)

....```

Here a sample from a bijective 2 pass adaptive Huffman compared to yours the static table takes up
the same amount of space. Same number of variables and the counts add up to the same value so table
should take same amount of code. Actually the Header of mine can be smaller up general but that't
not the point I am trying to make. I am assuming that you have a list of used symbols and there count
Yours orders from highest to lowest. I ordered from first seen to next seen. The numbers add to same
value in each case. Then again depending on counts and symbols used your table be smaller than mine.
I am only looking at this example you picked.

However you count down. I count up and get a free spot for where they start
instead of counts in first table store relative jumps

Code:
```[1]  L
[1]  e
[1]  s
[1]  _
[1]  b
[2]  a
[1]  u
[1]  x
[2]  k
[1]  i
[1]  w
[4]  q
[4]  v
[1]  o
[2]  c
[3]  !```
[/QUOTE]

Now the data part of stream

the "Les_b" is encoded with NULL (no bits)

no for the current Huffman table the next character "e" is encoded with at most 2 bits
since only 5 symbols used so far and each has a weight of 1

the "aux" encoded with Null (no bits) since position known

the "_" is encoded next since with at most 4 bits depending on how you build Huffman table

the "kiw" again encoded with Null (no bits) since known position and so on

So far have "Les_beaux_kiw" only used at most 6 bits so far and no symbol that only exist once
need be encoded at all. The worst case encoding occurs for symbols that occur only twice
and the max they could be encoded with would be 5 bits.

I am sure this is shorter than your example.

4. Originally Posted by biject.bwts
Here a sample from a bijective 2 pass adaptive Huffman compared to yours the static table takes up
the same amount of space. Same number of variables and the counts add up to the same value so table
should take same amount of code. Actually the Header of mine can be smaller up general but that't
not the point I am trying to make. I am assuming that you have a list of used symbols and there count
Yours orders from highest to lowest. I ordered from first seen to next seen. The numbers add to same
value in each case. Then again depending on counts and symbols used your table be smaller than mine.
I am only looking at this example you picked.

However you count down. I count up and get a free spot for where they start
instead of counts in first table store relative jumps

Code:
```[1]  L
[1]  e
[1]  s
[1]  _
[1]  b
[2]  a
[1]  u
[1]  x
[2]  k
[1]  i
[1]  w
[4]  q
[4]  v
[1]  o
[2]  c
[3]  !```
I had to read it at least six times to start to figure out how it works.
I suppose that something like this gives your first table:
Code:
```Les_beaux_kiwis_que_voici_!
11111 211 211   4   41 2  3```
Now the data part of stream

the "Les_b" is encoded with NULL (no bits)

no for the current Huffman table the next character "e" is encoded with at most 2 bits
since only 5 symbols used so far and each has a weight of 1
should be at least 2 and at most 3, or I'm totally lost?!

the "aux" encoded with Null (no bits) since position known

the "_" is encoded next since with at most 4 bits depending on how you build Huffman table
minimum variance would ensure 3 bits

Now correct me if I'm wrong, you have to build or at least update the Huffman tree (Vitter algorithm?) when symbol counts change, whereas I only build it once and make a small change to the prefixfree codes when a symbol count has reached zero (which is of course way less precise, I wanted to stay away from manipulating the tree but still being able to virtually chop it down at relatively low CPU cost). Your proposal is very clever and more efficient but also seems slower since you have to progressively grow the tree... it is also definitely less poetic.

5. No the values I stored are relative jumps. You stored counts The headers have the symbols used I used offset where you used counts.

the "Les_b" is encoded for free. Since the table that is stored with the code only contains relative offsets. IT IS NOT A STANDARD HUFFMAN TABLE
though the same size as yours.1

the first time you have to write data when the next e appears since position unknown

at this point if using Huffman you have 5 variables they all have weight of 1
Code:
```[1] L     10       I go from  10000.....  to 000...  for Huffman vales  because of method I use to convert from bit string or byte file bijectively
[1] e     01
[1] s     11
[1] _     001
[1] b     000     so the first unknown character e is encoded with  01 two bits

You know have compressed "Les_be"

now e has weight of two    moved to top

[2] e  10
[1] L  01
[1] s  11
[1] _  001
[1] b  000

Because of stored table you don't write code for
next two letters there placement know from table so you still have the 01 two bits
with the data "Les_beaux"  encoded
but know the Huffman table used to compress that exists only in the computer doing the coding
need to add the "aux" and encode for the next symbol used in the file.

I don't really think of its as Vitter since Vitter always has cell for the unknown symbol but call it what you want.

[2] e  100
[1] L  010
[1] s  110
[1] _  001
[1] b  101
[1] a  011
[1] u  111
[1] x  000

now code the next _ as 001

update table so _  has count of 2
and  the "kiw"  is free
add the kiw to running Huffman table

[2] _  100
[2] e  010
[1] L  110
[1] s  001
[1] b  101
[1] a  0110
[1] u  1110
[1] x  0001
[1] k  0111
[1] i  1111
[1] w  0000

encode the i with 1111

you have compressed
up to "Les_beaux_kiwi"

update and continue```
This table with jumps is like what I did with my bijective DC so it does work.
Maybe will write this program sure I have done it before.

6. Originally Posted by caveman
Now correct me if I'm wrong, you have to build or at least update the Huffman tree (Vitter algorithm?) when symbol counts change, whereas I only build it once and make a small change to the prefixfree codes when a symbol count has reached zero (which is of course way less precise, I wanted to stay away from manipulating the tree but still being able to virtually chop it down at relatively low CPU cost). Your proposal is very clever and more efficient but also seems slower since you have to progressively grow the tree... it is also definitely less poetic.
Well I think bijective compression less wasteful more poetic but slower. I also think as quantum computers more common there will need to be in use encryption programs that
have bijective compression with likely added noise in such a way that it is impossible to break since for quantum computers to solve a problem there has to be a single answer. If
there are many keys that work to give something the quantum computer is of little use.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•