Page 1 of 6 123 ... LastLast
Results 1 to 30 of 180

Thread: Hook

  1. #1
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    I would like know what people think of this new compressor.

    http://www.cs.fit.edu/~mmahoney/comp...text.html#2082

    Download Link

    hook v0.2 is a free, open source (GPL) command line file compressor by Nania Francesco Antonio, Jan. 8, 2007. It uses DMC: a state machine in which each state represents a bitwise context. Each state has 2 outgoing transitions corresponding to next bits 0 and 1, and a count n0 or n1 associated with each transition. Bit y (0 or 1) is compressed by arithmetic coding with probability ny/(n0+n1) (where ny is n0 or n1 according to y), and then ny is incremented.
    States are cloned (copied) whenever the incoming and outgoing counts exceed certain limits. This has the effect of creating a new context extended by 1 bit. In the example below, the state representing context 110 is cloned by creating a new state 0110 because the incoming 0 transition count (ny for y=0) from state 11 exceeded a limit. This transition is moved to point to the new state. Other incoming transitions (not shown) remain pointing to the original state. The outgoing transitions are copied. The counts of the original state are distributed to the new state in proportion to the moved transitions contribution to those counts, which is w = ny/(n0+n1).

    n0 ----> 1100 n0*(1-w) ----> 1100
    ny / / /
    11 -----> 110 11 110 /
    (y=0) | /
    n1 ----> 1101 | n1*(1-w) ----> 1101
    | / /
    | n0*w / /
    | ny / /
    +-----> 0110 /
    /
    n1*w --

    Before cloning After cloning 110 to 0110

    In hook v0.1, the counts are 32 bit floating point numbers initialized to 0.1. The initial state machine has 256*255 states representing bytewise order 1 contexts with uniform statistics. When memory is exhausted, the model is discarded and the state machine is reinitialized. A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length are parameters. The optimal parameters for enwik8 and enwik9 are "7 2 6", where 7 selects the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory), 2 is the limit (range 1 to 7), and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64). Larger lengths are better for large files because they conserve memory at the expense of compression.

  2. #2
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Hook 0.2 scores 16,304,866 bytes in my testset.

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Black_Fox
    Hook 0.2 scores 16,304,866 bytes in my testset.
    Is anyone impressed by the current performance of this new compressor?

  4. #4
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by LovePimple
    Are you impressed by the current compression power of this new compressor?
    By the current power - no Im not... but its new and it will get better over time hopefully... and with exception of ocamyd its the only compressor using DMC and in comparison with ocamyd its pretty fast.

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Well, HOOK is good for texts, but useless overall...

  6. #6
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    There are SFC and Calgary/Cantenbury Corpus results for hook 0.3 on maximumcompression.com guestbook.

    SFC:
    13,935,923 bytes - v0.3
    14,057,313 bytes - v0.2

  7. #7
    Member
    Join Date
    Jul 2006
    Location
    US
    Posts
    39
    Thanks
    26
    Thanked 1 Time in 1 Post
    yea, I've seen the posting of v0.3 results at Maximum Compression too, but where is the download v0.3 ???

    I've looked for it at the "Large Text Compression Benchmark" site, where v0.2 was posted but I dont see it.

    does Hook have its own site?

  8. #8
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    There is only hook.zip to download, which was updated and contains also v0.3

  9. #9
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    I wonder when will v0.3a appear in that archive...

  10. #10
    Guest
    Oops, I fixed the hook.zip archive now. It contains all 3 versions.

    http://cs.fit.edu/~mmahoney/compression/text.html# 2019

  11. #11
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Matt!

  12. #12
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thank you

  13. #13
    Guest
    Hook is a demonstrative program of the potentialities of the DMC nothing other, if I will succeed to find the way spremer it to the maximum could be obtained also a discreet one "compressor"!

  14. #14
    Guest
    HOOK V. 0.4 ADVANCED DMC COMPRESSOR
    SFC TEST [MAXIMUM COMPRESSION]: OPTION
    WORLD95.TXT__= __530.548 Bytes 4 0 2
    FP.LOG_______= __701.772 Bytes 6 0 3
    ENGLISH.DIC__= 1.381.455 Bytes 3 7 5
    ACRORD32.EXE_= 1.786.628 Bytes 3 1 2
    MSO97.DLL____= 2.114.041 Bytes 3 1 4
    RAFALE.BMP___= __814.478 Bytes 3 1 6
    A10.JPG______= __833.095 Bytes 0 8 7
    VCFIU.HLP____= __792.614 Bytes 3 0 2
    OHS.DOC______= __925.118 Bytes 4 1 2
    FLASHMX.PDF__= 3.838.602 Bytes 3 8 6
    _____________________________________
    TOTAL = 13.718.351 Bytes
    Canterb. corpus (ISO)=638.472 Bytes 2 1 4
    Calgary corpus (ISO)=911.151 Bytes 2 1 5

  15. #15
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    My testset performance results:
    16,033,622 - v0.4
    (16,211,402 - v0.3a)

  16. #16
    Guest
    EMILCONT I've tested your HOOK 0.4 : Big loss in PDF File... please verify the filters

  17. #17
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Anonymous
    EMILCONT Ive tested your HOOK 0.4
    Author of Hook 0.4 and author of Emilcont are TWO different people!

  18. #18
    Guest
    HOOK V. 0.5 ADVANCED DMC+LZP COMPRESSOR
    SFC TEST [MAXIMUM COMPRESSION]: [memsize][limit][lenght][enable lz] {[lz step]}
    WORLD95.TXT__= __530.556 Bytes 128000000 1 3 0
    FP.LOG_______= __600.582 Bytes 512000000 2 6 1 25
    ENGLISH.DIC__= __763.578 Bytes 128000000 9 31 1 2
    ACRORD32.EXE_= 1.783.271 Bytes 64000000 1 2 0
    MSO97.DLL____= 2.116.277 Bytes 64000000 2 12 0
    RAFALE.BMP___= __814.235 Bytes 64000000 2 30 0
    A10.JPG______= __833.099 Bytes 64000000 10 64 0
    VCFIU.HLP____= __809.120 Bytes 64000000 1 1 0
    OHS.DOC______= __926.471 Bytes 128000000 1 2 0
    FLASHMX.PDF__= 3.838.491 Bytes 64000000 8 32 0
    _____________________________________
    TOTAL = 13.015.680 Bytes
    Canterb. corpus (ISO)=647.868 Bytes 64000000 2 12 0
    Calgary corpus (ISO)=911.200 Bytes 64000000 2 9 0
    autor of hook,fpaq0s6,fpaq0s5,fpaq2,fpaq3d etc........

  19. #19
    Guest
    version 0,5 contained some bug that did not allow to compress files advanced to 128MB that I have corrected with this version! excused! here it turns out to you:
    HOOK V. 0.5b ADVANCED DMC+LZP COMPRESSOR
    SFC TEST [MAXIMUM COMPRESSION]: [memsize][limit][lenght][enable lz] {[lz step]}
    WORLD95.TXT__= __530.561 Bytes 128000000 1 3 0
    FP.LOG_______= __590.777 Bytes 512000000 2 6 1 25
    ENGLISH.DIC__= __812.447 Bytes 128000000 9 31 1 2
    ACRORD32.EXE_= 1.765.497 Bytes 64000000 2 3 0
    MSO97.DLL____= 2.116.405 Bytes 64000000 2 12 0
    RAFALE.BMP___= __814.220 Bytes 64000000 2 30 0
    A10.JPG______= __833.105 Bytes 64000000 9 70 0
    VCFIU.HLP____= __810.545 Bytes 64000000 1 1 0
    OHS.DOC______= __926.247 Bytes 128000000 1 4 0
    FLASHMX.PDF__= 3.838.429 Bytes 64000000 8 32 0
    TOTAL = 13.038.233 Bytes
    Canterb. corpus (ISO)=647.777 Bytes 64000000 2 13 0
    Calgary corpus (ISO)=911.073 Bytes 64000000 2 9 0

  20. #20
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks!

  21. #21
    Guest
    > Author of Hook 0.4 and author of Emilcont are TWO different people!
    BF, your remark is understandable, but it probably was emilcont posting (he mentioned he tested and his testfiles are nonpublic).

  22. #22
    Guest
    I hold to us to say that they are not the author of emilcont I I am Nania Francisco Antonio he calls Berto Destasio! I do not see as the contrary can be said!

  23. #23
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Nania Francesco A.
    I hold to us to say that they are not the author of emilcont I I am Nania Francisco Antonio he calls Berto Destasio! I do not see as the contrary can be said!
    He thinks that the person above "Anonymous" that posted this message:
    Quote Originally Posted by Anonymous
    EMILCONT Ive tested your HOOK 0.4 : Big loss in PDF File... please verify the filters
    (Posted: 20 Jan 2007 21:40) could have been EMILCONT himself.

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    According to the IP address, it was someone from Italy. Berto Destasio also from Italy. So, 99.98% it was Berto! However, dear users, please specify at least your nickname!


  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    First, I was really impressed by the Hook's growth speed!

    However, where I can get Hook 0.5?

  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    However, where I can get Hook 0.5?
    IMHO: I think it would be a good idea if the author were to set up his own web site. We could then download the updates as soon as they are released.

  27. #27
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Anonymous
    BF, your remark is understandable, but it probably was emilcont posting
    Seems true... sorry.

    Quote Originally Posted by LovePimple
    it would be a good idea if the author were to set up his own web site
    Or the author could send it to more people (presuming that he e-mails it to Matt Mahoney anyway).

  28. #28
    Guest
    I have sended the version 0.5b to the mythical Matt Mahoney and soon it will be in net on its situated one!

  29. #29
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Nania Francesco A.
    I have sended the version 0.5b to the mythical Matt Mahoney and soon it will be in net on its situated one!
    Excellent! Please keep us informed.

  30. #30
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Update:

    In hook v0.2, the counts are 32 bit floating point numbers initialized to 0.1. The initial state machine has 256*255 states representing bytewise order 1 contexts with uniform statistics. When memory is exhausted, the model is discarded and the state machine is reinitialized. A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length are parameters. The optimal parameters for enwik8 and enwik9 are "c 7 2 6", c means compress, 7 selects the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory), 2 is the limit (range 1 to 7), and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64). Larger lengths are better for large files because they conserve memory at the expense of compression.

    hook v0.3 (Jan. 11, 2007) allows up to 1.8 GB memory (first option = 9) and uses double precision predictions in the 32 bit arithmetic coder.

    hook v0.3a (Jan. 12, 2007) initializes the counts to 0.125 (instead of 0.1) and uses 24 bit precision in the arithmetic coder (instead of 32 bit).

    hook v0.4 (Jan. 15, 2007) initializes counts to 0.1. Argument 2 selects length 3 (not 2).

    hook v0.5b (Jan. 22, 2007) adds an LZP preprocessor. If the next byte to be coded is the same as the byte that occurred in the last matching 3 byte context, then this is indicated by coding a flag bit in an order 3 model (32 MB memory), and a match length coded by DMC with a fixed size of 128 MB. If there is no match, then the literal byte is coded by another variable sized DMC model. The parameters "c 1600000000 2 64 1 6" select compression (c), 1.6 GB for the DMC literal model (1600000000), a limit of 2 (minimum count for the cloned state), length of 64 (minimum remaining count for the state to be cloned), LZP selected (1), and a minimum match length of 6.

Page 1 of 6 123 ... LastLast

Similar Threads

  1. Hook 1.4 , ADMC compression return!
    By Nania Francesco in forum Data Compression
    Replies: 9
    Last Post: 4th May 2009, 22:44
  2. Hook - Free, closed source file compressor
    By LovePimple in forum Forum Archive
    Replies: 94
    Last Post: 15th December 2007, 18:20

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •