Results 1 to 19 of 19

Thread: I volunteer for some big dictionary test

  1. #1
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts

    I volunteer for some big dictionary test

    i hereby volentier... volenter... volinter... volienter... damn getting nothing but red underlines here... how do you spell the damn word in english?

    Anyway i have upgraded my main PC to contain 8GB of memory. And if anybody need to make some test with big memory requirement's (now at least got that word right) i would be willing to run them.

    Running Windows XP 64 here.


    P.S. if the need grows for some 16GB testing. i might be able to get me that amount of memory
    Last edited by SvenBent; 5th June 2008 at 20:59.

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    The thread should be titled:

    Eat my pants, I've got an 8 GB RAM machine!


  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Actually the amount of memory doesn't matter much at that scale.
    Its impossible to access it all more than a few times per _minute_
    if it would be some statistics, so swapping out some unnecessary areas
    seems more rational.
    And also random memory access is much slower than any division, so its
    really hard to imagine a (compression-related) function which would benefit
    from a lookup table of that size.

  4. #4
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by SvenBent View Post
    how do you spell the damn word in english?
    Volunteer I guess
    You could use such amount partially with emilcont and some earlier versions of Francesco's compressors IIRC (and maybe ocamyd, but I'm not very sure there).
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    volunteer

    You could run durilca4linux_3 with lots of memory like -m7600 (try with -o16 or -o32) and maybe set a new record on LTCB.
    http://cs.fit.edu/~mmahoney/compression/text.html

    Of course you would need to install Linux.

  6. #6
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Maybe you can test durilca in maximum compression fileset.

    durilca only works on enwik9?

  7. #7
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts

    bug in 7-zip

    found a bug in 7.zip 4.57

    whatever if you chose 1024mb og 768mb dictionarysize
    7-zip reports that the file is compressed with 768mb dictionary size



    Of course you would need to install Linux.
    no *nix for me

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    You could try using one of these just for testing purposes. This would (hopefully) allow you to test Linux apps without disturbing your Windows install.

    http://www.knopper.net/knoppix/index-en.html

    http://www.frozentech.com/content/livecd.php

  9. #9
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Could you test CMM4 on enwik9 with more memory? ATM i can only test with max. memory < 1gb.

  10. #10

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    735
    Thanked 660 Times in 354 Posts
    Quote Originally Posted by Shelwien View Post
    Actually the amount of memory doesn't matter much at that scale.
    Its impossible to access it all more than a few times per _minute_
    if it would be some statistics, so swapping out some unnecessary areas
    seems more rational.
    large-dict lz is a good example. swapping doesn't help because you can't read more than ~100 random chunks from disk per second due to seek times

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    As I said... I really doubt that you'll find a useful algorithm which _requires_ scanning the entire
    4G+ memory _and_ would be able to do that even 100 times per second
    (even once per second doesn't seem realistic to me).
    Of course, "swapping doesn't help", but I'm trying to say that it won't significantly
    change the compressor's peformance with a certain amount of memory (like 2G)
    is already available.
    Well, a LZ compressor with 4G hash table would certainly be slow with only 2G of RAM
    and swapping, but then I'm sure that its possible to implement a better match finder
    on such a scale (better than a single huge hash table)... and also "luckily" most of
    32-bit compressors' data structures won't scale even past 2G anyway.

  13. #13
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Quote Originally Posted by toffer View Post
    Could you test CMM4 on enwik9 with more memory? ATM i can only test with max. memory < 1gb.
    the largest i seem to run is

    Code:
    cmm4 96 enwik9 enwik9.cmm
    which only uses around 1.3gb memory

  14. #14
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by SvenBent View Post
    the largest i seem to run is

    Code:
    cmm4 96 enwik9 enwik9.cmm
    which only uses around 1.3gb memory
    You could try higher values for the context :

    :> cmm4 77 tb.tar tbtar_cmm1f_77.cmm4
    CMM4 v0.1f by C. Mattern Jun 4 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
    Allocated 1705742 kB.
    Encoding: 11715/ 18816 kB (4.98 bpc), 506 kB/s

  15. #15
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by toffer View Post
    Could you test CMM4 on enwik9 with more memory? ATM i can only test with max. memory < 1gb.
    Hi Toffer !

    I didn't try on enwik9, but on enwik8 cmm4 77 gives a smaller file then cmm 76 :

    cmm4 77 enwik8 -> 20.519.090
    cmm4 76 enwik8 -> 20.540.566 (= the same as cmm4 96 for enwik

    -
    Pat357

  16. #16
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Here are some size from cmm4 compression

    enwik9.96.cmm - 172.213.878 bytes (highest W value)
    enwik9.77.cmm - 171.825.549 bytes (highest M value, largest mem usage)
    enwik9.76.cmm - 172.628.200 bytes (split point where you can only increase W or M from but not both)
    enwik9.43.cmm - 181.648.064 bytes (the exampled used in the help)

    ---

    some observations:

    since the largest combination seems to use only 1.7gb ram i guess that CMM4 is a 32bit application and thereby cannot access more the 2gb of memory

    CMM4 is dual threaded
    it seems to be able to use two cores but not anymore than that.

    i would love to try a 64bit version with something like 99 (should be around 6.6gbytes ram ?)
    Last edited by LovePimple; 8th June 2008 at 13:34.

  17. #17
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi!

    Thanks for testing!

    The only thing more memory will influence is the number of hash collisions (ok, the match model will find more matches, but this is negliable).

    @pat357
    As stated in the help the amount of memory for the context model has much more influence than the sliding window.

    @svenbent
    Well, i can tell you that CMM4 definitely isn't dual threaded.

    And for the 64 bit version - i can only provide linux binaries for 64 bit. Would this be ok? As LP said, you could use Knoppix or some other livecd distribution.

  18. #18
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    How about trying Emilcont v0.2 at -9 setting on the SFC and ENWIK files?

    http://www.freewebs.com/emilcont/

  19. #19
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Quote Originally Posted by toffer View Post
    Hi!
    @svenbent
    Well, i can tell you that CMM4 definitely isn't dual threaded.

    now that remember correctly. My daughters game was probably minimized in the background. and thereby showing usage of two cores (One for CMM4 and one for the game)

    i will try cmm4 on some other files tomorrow evening

Similar Threads

  1. REP and Delta fails with big files
    By SvenBent in forum Data Compression
    Replies: 14
    Last Post: 23rd November 2008, 19:41
  2. LZSS with a large dictionary
    By encode in forum Data Compression
    Replies: 31
    Last Post: 31st July 2008, 21:15
  3. Replies: 4
    Last Post: 17th March 2008, 21:19
  4. Fast decompression of big files
    By SvenBent in forum Forum Archive
    Replies: 16
    Last Post: 8th March 2008, 19:17
  5. Start of another BIG and really real Benchmark
    By Simon Berger in forum Forum Archive
    Replies: 31
    Last Post: 15th November 2007, 16:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •