1. I would like know what people think of this new compressor.

http://www.cs.fit.edu/~mmahoney/comp...text.html#2082

hook v0.2 is a free, open source (GPL) command line file compressor by Nania Francesco Antonio, Jan. 8, 2007. It uses DMC: a state machine in which each state represents a bitwise context. Each state has 2 outgoing transitions corresponding to next bits 0 and 1, and a count n0 or n1 associated with each transition. Bit y (0 or 1) is compressed by arithmetic coding with probability ny/(n0+n1) (where ny is n0 or n1 according to y), and then ny is incremented.
States are cloned (copied) whenever the incoming and outgoing counts exceed certain limits. This has the effect of creating a new context extended by 1 bit. In the example below, the state representing context 110 is cloned by creating a new state 0110 because the incoming 0 transition count (ny for y=0) from state 11 exceeded a limit. This transition is moved to point to the new state. Other incoming transitions (not shown) remain pointing to the original state. The outgoing transitions are copied. The counts of the original state are distributed to the new state in proportion to the moved transitions contribution to those counts, which is w = ny/(n0+n1).

n0 ----> 1100 n0*(1-w) ----> 1100
ny / / /
11 -----> 110 11 110 /
(y=0) | /
n1 ----> 1101 | n1*(1-w) ----> 1101
| / /
| n0*w / /
| ny / /
+-----> 0110 /
/
n1*w --

Before cloning After cloning 110 to 0110

In hook v0.1, the counts are 32 bit floating point numbers initialized to 0.1. The initial state machine has 256*255 states representing bytewise order 1 contexts with uniform statistics. When memory is exhausted, the model is discarded and the state machine is reinitialized. A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length are parameters. The optimal parameters for enwik8 and enwik9 are "7 2 6", where 7 selects the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory), 2 is the limit (range 1 to 7), and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64). Larger lengths are better for large files because they conserve memory at the expense of compression.

2. Hook 0.2 scores 16,304,866 bytes in my testset.

3. Originally Posted by Black_Fox
Hook 0.2 scores 16,304,866 bytes in my testset.
Is anyone impressed by the current performance of this new compressor?

4. Originally Posted by LovePimple
Are you impressed by the current compression power of this new compressor?
By the current power - no Im not... but its new and it will get better over time hopefully... and with exception of ocamyd its the only compressor using DMC and in comparison with ocamyd its pretty fast.

5. Well, HOOK is good for texts, but useless overall...

6. There are SFC and Calgary/Cantenbury Corpus results for hook 0.3 on maximumcompression.com guestbook.

SFC:
13,935,923 bytes - v0.3
14,057,313 bytes - v0.2

7. yea, I've seen the posting of v0.3 results at Maximum Compression too, but where is the download v0.3 ???

I've looked for it at the "Large Text Compression Benchmark" site, where v0.2 was posted but I dont see it.

does Hook have its own site?

8. There is only hook.zip to download, which was updated and contains also v0.3

9. I wonder when will v0.3a appear in that archive...

10. Oops, I fixed the hook.zip archive now. It contains all 3 versions.

http://cs.fit.edu/~mmahoney/compression/text.html# 2019

11. Thanks Matt!

12. Thank you

13. Hook is a demonstrative program of the potentialities of the DMC nothing other, if I will succeed to find the way spremer it to the maximum could be obtained also a discreet one "compressor"!

14. HOOK V. 0.4 ADVANCED DMC COMPRESSOR
SFC TEST [MAXIMUM COMPRESSION]: OPTION
WORLD95.TXT__= __530.548 Bytes 4 0 2
FP.LOG_______= __701.772 Bytes 6 0 3
ENGLISH.DIC__= 1.381.455 Bytes 3 7 5
ACRORD32.EXE_= 1.786.628 Bytes 3 1 2
MSO97.DLL____= 2.114.041 Bytes 3 1 4
RAFALE.BMP___= __814.478 Bytes 3 1 6
A10.JPG______= __833.095 Bytes 0 8 7
VCFIU.HLP____= __792.614 Bytes 3 0 2
OHS.DOC______= __925.118 Bytes 4 1 2
FLASHMX.PDF__= 3.838.602 Bytes 3 8 6
_____________________________________
TOTAL = 13.718.351 Bytes
Canterb. corpus (ISO)=638.472 Bytes 2 1 4
Calgary corpus (ISO)=911.151 Bytes 2 1 5

15. My testset performance results:
16,033,622 - v0.4
(16,211,402 - v0.3a)

16. EMILCONT I've tested your HOOK 0.4 : Big loss in PDF File... please verify the filters

17. Originally Posted by Anonymous
EMILCONT Ive tested your HOOK 0.4
Author of Hook 0.4 and author of Emilcont are TWO different people!

18. HOOK V. 0.5 ADVANCED DMC+LZP COMPRESSOR
SFC TEST [MAXIMUM COMPRESSION]: [memsize][limit][lenght][enable lz] {[lz step]}
WORLD95.TXT__= __530.556 Bytes 128000000 1 3 0
FP.LOG_______= __600.582 Bytes 512000000 2 6 1 25
ENGLISH.DIC__= __763.578 Bytes 128000000 9 31 1 2
ACRORD32.EXE_= 1.783.271 Bytes 64000000 1 2 0
MSO97.DLL____= 2.116.277 Bytes 64000000 2 12 0
RAFALE.BMP___= __814.235 Bytes 64000000 2 30 0
A10.JPG______= __833.099 Bytes 64000000 10 64 0
VCFIU.HLP____= __809.120 Bytes 64000000 1 1 0
OHS.DOC______= __926.471 Bytes 128000000 1 2 0
FLASHMX.PDF__= 3.838.491 Bytes 64000000 8 32 0
_____________________________________
TOTAL = 13.015.680 Bytes
Canterb. corpus (ISO)=647.868 Bytes 64000000 2 12 0
Calgary corpus (ISO)=911.200 Bytes 64000000 2 9 0
autor of hook,fpaq0s6,fpaq0s5,fpaq2,fpaq3d etc........

19. version 0,5 contained some bug that did not allow to compress files advanced to 128MB that I have corrected with this version! excused! here it turns out to you:
HOOK V. 0.5b ADVANCED DMC+LZP COMPRESSOR
SFC TEST [MAXIMUM COMPRESSION]: [memsize][limit][lenght][enable lz] {[lz step]}
WORLD95.TXT__= __530.561 Bytes 128000000 1 3 0
FP.LOG_______= __590.777 Bytes 512000000 2 6 1 25
ENGLISH.DIC__= __812.447 Bytes 128000000 9 31 1 2
ACRORD32.EXE_= 1.765.497 Bytes 64000000 2 3 0
MSO97.DLL____= 2.116.405 Bytes 64000000 2 12 0
RAFALE.BMP___= __814.220 Bytes 64000000 2 30 0
A10.JPG______= __833.105 Bytes 64000000 9 70 0
VCFIU.HLP____= __810.545 Bytes 64000000 1 1 0
OHS.DOC______= __926.247 Bytes 128000000 1 4 0
FLASHMX.PDF__= 3.838.429 Bytes 64000000 8 32 0
TOTAL = 13.038.233 Bytes
Canterb. corpus (ISO)=647.777 Bytes 64000000 2 13 0
Calgary corpus (ISO)=911.073 Bytes 64000000 2 9 0

20. Thanks!

21. > Author of Hook 0.4 and author of Emilcont are TWO different people!
BF, your remark is understandable, but it probably was emilcont posting (he mentioned he tested and his testfiles are nonpublic).

22. I hold to us to say that they are not the author of emilcont I I am Nania Francisco Antonio he calls Berto Destasio! I do not see as the contrary can be said!

23. Originally Posted by Nania Francesco A.
I hold to us to say that they are not the author of emilcont I I am Nania Francisco Antonio he calls Berto Destasio! I do not see as the contrary can be said!
He thinks that the person above "Anonymous" that posted this message:
Originally Posted by Anonymous
EMILCONT Ive tested your HOOK 0.4 : Big loss in PDF File... please verify the filters
(Posted: 20 Jan 2007 21:40) could have been EMILCONT himself.

24. According to the IP address, it was someone from Italy. Berto Destasio also from Italy. So, 99.98% it was Berto! However, dear users, please specify at least your nickname!

25. First, I was really impressed by the Hook's growth speed!

However, where I can get Hook 0.5?

26. Originally Posted by encode
However, where I can get Hook 0.5?
IMHO: I think it would be a good idea if the author were to set up his own web site. We could then download the updates as soon as they are released.

27. Originally Posted by Anonymous
BF, your remark is understandable, but it probably was emilcont posting
Seems true... sorry.

Originally Posted by LovePimple
it would be a good idea if the author were to set up his own web site
Or the author could send it to more people (presuming that he e-mails it to Matt Mahoney anyway).

28. I have sended the version 0.5b to the mythical Matt Mahoney and soon it will be in net on its situated one!

29. Originally Posted by Nania Francesco A.
I have sended the version 0.5b to the mythical Matt Mahoney and soon it will be in net on its situated one!

30. Update:

In hook v0.2, the counts are 32 bit floating point numbers initialized to 0.1. The initial state machine has 256*255 states representing bytewise order 1 contexts with uniform statistics. When memory is exhausted, the model is discarded and the state machine is reinitialized. A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length are parameters. The optimal parameters for enwik8 and enwik9 are "c 7 2 6", c means compress, 7 selects the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory), 2 is the limit (range 1 to 7), and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64). Larger lengths are better for large files because they conserve memory at the expense of compression.

hook v0.3 (Jan. 11, 2007) allows up to 1.8 GB memory (first option = 9) and uses double precision predictions in the 32 bit arithmetic coder.

hook v0.3a (Jan. 12, 2007) initializes the counts to 0.125 (instead of 0.1) and uses 24 bit precision in the arithmetic coder (instead of 32 bit).

hook v0.4 (Jan. 15, 2007) initializes counts to 0.1. Argument 2 selects length 3 (not 2).

hook v0.5b (Jan. 22, 2007) adds an LZP preprocessor. If the next byte to be coded is the same as the byte that occurred in the last matching 3 byte context, then this is indicated by coding a flag bit in an order 3 model (32 MB memory), and a match length coded by DMC with a fixed size of 128 MB. If there is no match, then the literal byte is coded by another variable sized DMC model. The parameters "c 1600000000 2 64 1 6" select compression (c), 1.6 GB for the DMC literal model (1600000000), a limit of 2 (minimum count for the cloned state), length of 64 (minimum remaining count for the state to be cloned), LZP selected (1), and a minimum match length of 6.

Page 1 of 6 123 ... Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•