Page 1 of 7 123 ... LastLast
Results 1 to 30 of 190

Thread: M1 - Optimized demo coder

  1. #1
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts

    M1 - Optimized demo coder

    Hey everyone!

    I've made some progress in my optimization experiments - the result is a simple CM coder with optimized parameters (only for text files, tuned on book1). It includes the optimizer which can easily be integrated into other programs.

    It also implements a new kind of mixer, which needs no division or table lookups while achieving performance comparable to PAQ's mixer (i found nothing better than an optimized PAQ mixer) - but it is much faster.

    Alltogether the coder is pretty fast and only uses two models, which clearly shows how much redundancy exists in present coders due to ad-hoc parameters.

    Something about speed - sr2 is twice as fast, but take into account, that sr2 codes a symbol in 1 step in ~60%, 2 steps in ~15% and 3 steps in ~5% and only ~20% in 8 steps; while this dumb coder always codes 8 bits per symbol.

    Here it is:

    http://freenet-homepage.de/toffer_86/m1_0.0_080904.7z

    /bin/Release holds the coder binary
    /bin/Optimize holds the coder with integrated optimizer

    Use it like this:

    m1 o file1 file2 ...

    It tunes the parameters using a genetic algorithm (deterministic crowding GA) to achieve maximum compression when coding file1, file2, ... separate.

    I'd be happy to discuss a bit about this, hopefully..

    Have fun

    PS.: Thanks to osmanturan and shelwien for testing and running the optimizer, which would take too much time on my slow computer.
    Last edited by toffer; 5th September 2008 at 01:29.

  2. #2
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    Will take a peek. Should we target our tests towards text files only? or just throw anything at it?

    Tried to download but all i got was this:

    Die von Ihnen aufgerufene Seite wurde nicht gefunden ...
    Sie wurde entweder entfernt, umbenannt, oder sie ist vor?bergehend nicht erreichbar. Versuchen Sie folgendes:

    1. ?berpr?fen Sie Ihre Eingabe im Adressfeld.
    2. Achten Sie auf Gro?- und Kleinschreibung:
    http://freenet-homepage.de/schmidt oder
    http://freenet-homepage.de/Schmidt oder
    http://freenet-homepage.de/SCHMIDT sind unterschiedliche Adressen.

    i guess it's saying it's not found or something?

    Edit: Scratch that, clicked again and it started downloading heh, i guess i tried it a few seconds too soon
    Last edited by Intrinsic; 5th September 2008 at 01:24.

  3. #3
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You can use anything, i used stationary files (text), since it provides a better chance for testing certain things and you can easily compare against a lot of other coders via LTCB.

    To get good general purpose parameters try to use it on txt, x86 data. Don't use too much, since the optimization process is pretty time consuming.

    Try again, i fixed the broken link.

  4. #4
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    We really need something new. Thanks a lot for a new mixing schema and optimized codec. Here is a small test on my machine (Core2 Duo 2.2GHz, 2 GB RAM, Vista Business x64 SP1)

    Code:
    BOOK1 (768,771 bytes)
    M1 -> 245,848 bytes @ 1452 kb/s (0.517 seconds)
    FLPAQ1 -> 207,209 bytes @ 341 kb/s (2.200 seconds)
    
    fp.log (20,617,071 bytes)
    M1 -> 875.373 bytes @ 2715 kb/s (7.415 seconds)
    FLPAQ1 -> 513,074 bytes @ 495 kb/s (40.700 seconds)
    
    world95.txt (2,988,578 bytes)
    M1 -> 727,370 bytes @ 2225 kb/s (1.312 seconds)
    FLPAQ1 -> 494,829 @ 389 kb/s (7.500 seconds)
    
    ENWIK8 (100000000 bytes)
    M1 -> 28,700,684 bytes @ 3094 kb/s (31.365 seconds)
    FLPAQ1 -> 22,481,907 bytes @ 397 kb/s (245.705 seconds)
    
    LARGE.TXT (41,664,000 bytes)
    M1 -> 55,078 bytes @ 5569 kb/s (7.306 seconds)
    FLPAQ1 -> 14,701 bytes @ 621 kb/s (65.561 Seconds)
    FLPAQ1 is basically a chopped version of LPAQ1. FLPAQ1 has no match model and lower case unigram model. It has only order 1-4, 6 and 2 sse stages. Honestly, this comparasion is unfair. Because, M1 has only two models while FLPAQ1 has 5 models plus 2 sse stage. Notice that, M1 is ~800% faster than FLPAQ1 on ENWIK8 while ~25% worser compression (do not forget only two model efficiency!!!)

    LARGE.TXT is an hand-made artifical file which only consists ASCII '0' (not NULL character). I have put this file for showing their pure power without caching problems (the whole file has benefits for increasing cache efficiency for hash table based cm coder).

    Over all, I can easily say that, an optimized approach always beats the ad-hoc approach. I hope, we'll see more "optimized" coder in near future. Thanks toffer!

    --Edit: I forgot to give FLPAQ memory usage. M1 uses 16mb context. So, I've chosen same for FLPAQ.
    Last edited by osmanturan; 5th September 2008 at 02:51.

  5. #5
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Another test on my machine

    Code:
    ENWIK9 (1,000,000,000 bytes)
    M1 -> 260,546,250 bytes @ 3355 kb/s (291.057 seconds)
    So, on LTB, M1 takes 70th place just after Tornado.

  6. #6
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Some references:

    Niching Methods for Genetic Algorithms (1995)
    http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf


    The mixer:

    A mixed probability pm is given by a linear average of two input probabilities p0 and p1:

    (1) pm = w*p0 + (1-w)*p1 = (p0-p1)*w + p1

    A counter can be used to update the weight, the counter update transforms pm -> pm' and the weight w' can easily be extracted:

    (pm-p1) / (p0-p1) = w

    (2) (pm'-p1) / (p0-p1) = w'

    That is btw what Shelwien uses for mixing. One can rewrite the weight update formula (2) by substituting (1) and the counter update:

    pm' = c*pm + (1-c)*py

    where c is the weighting of the old quantised bit history and py is the mixing distribution (can e.g. be set to py=y, the encountered bit). Rearranging (2) as mentioned yields:

    w' = c*w + (1-c) * (py-p1)/(p0-p1)

    I noticed that the term (py-p1)/(p0-p1) can often by approximated (dependent on the models p0 and p1 output) by a constant wy dependent on py or y, respectively, since it is almost always <0 or >1, so the division can be completely eliminated (the results is either 0 or 1). Finally we get:

    w' = c*w + (1-c)*wy

    So this mixer does an exponential smoothing of a previous weight w, which is a quantisation of the past model's behavior, and a new "delta"-impulse wy = {0, 1}. To improve performance one mixer is selected dependent on a (p0,p1) pair which is quantised to a 13x13 grid.

  7. #7
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I've been analyzing the model output and found something interesting. Have a look at the attatched picture. These curves are histograms of the probability output quantised to 8 bit:

    blue, red, green - low order model (MDL0), high order model (MDL1), mixed output.

    These spectral lines clearly show that the higher order model suffers from less statistics per context, they are the trace of counter updates. So some kind of context merging should improve compression. One can see that mixing largely smoothes that out.

    EDIT: I've normalized the graph and added a graph which shows the entropy distribution. These peaks consume a conciderable amount of entropy, too... By smoothing i should be able to achieve better compression - now i "just" need a model which allows me to do that

    EDIT2: An ideal model's output should look like the binary entropy function
    http://en.wikipedia.org/wiki/Binary_entropy_function
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	ent_mdl0_mdl1_mix.png 
Views:	589 
Size:	21.5 KB 
ID:	181   Click image for larger version. 

Name:	pdf_p0_p1_pm.png 
Views:	555 
Size:	8.2 KB 
ID:	182  
    Last edited by toffer; 5th September 2008 at 22:44.

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test on my Pentium III 750 Mhz, 512 MB RAM, Windows 2000 SP4 machine...

    Code:
    D:\m1test>m1 c enwik8 enwik8.m1
    M1 v0.0 by C. Mattern  Sep  4 2008
    Experimental file compressor.
    
    Building tables...done.
    Allocated 16384 kB.
    Encoding...done. 28700684/100000000 bytes, 119.43 s
    
    D:\m1test>m1 c enwik9 enwik9.m1
    M1 v0.0 by C. Mattern  Sep  4 2008
    Experimental file compressor.
    
    Building tables...done.
    Allocated 16384 kB.
    Encoding...done. 260546250/1000000000 bytes, 1177.92 s

  9. #9
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi!

    Very good work!

  10. #10
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi!

    I've successfully implemented context merging and it improves compression. The optimizer chooses longer context masks, this is evidence for working context merging - longer contexts can be modeled more accurate.

    Prediction is done in two steps:

    1. A "primary" model uses 3 bit delayed counters with a single update mapping, so it outputs the delayed bits y_k-1, y_k-2, y_k-3 and the probability estimate which quantises the previous bit history p_k-3 in the k-th coding step.

    2. A "secondary" model mapps these primary outputs to a final prediction (SSE).

    I've reintegrated PAQ's mixing mechanism (*mixer2b.exe) so you can easily compare the performance against my linear mixer (*mixer2a.exe).

    In this stage of experiments it clearly shows up that context merging is a must-have, but this implementation is quite slow - most time is eaten up by the delayed counter (have a look at counter2::Update in model.h). I'd like to get a more time-efficient context merging model and aiming towards PAQ-style states. I'll work out a parametrized FSM representation which can be optimized.

    I'd be happy if anybody could run the optimizer on different files. The optimizer can now be resumed. Just run the optimizer and terminate it via STRG+C, it'll ask if you really want to quit. It dumps the current optimization progress in population.txt. If the optimizer locates this file in the working directory it resumses the process at that point.

    http://freenet-homepage.de/toffer_86/m1_0.1_080914.7z

    Have fun & leave some more comments. I'd like to get some more technical discussion.

    EDIT: i've messed up the namings, to clarify:
    *2a executables contain my mixer implementation (faster, but a bit worse compression)
    *2b executables are compiled for the paq mixer implementation
    Last edited by toffer; 15th September 2008 at 21:24.

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,363
    Thanks
    212
    Thanked 1,016 Times in 540 Posts
    Considering state counters, Matt says:

    > Furthermore, to allow an 8 bit representation for
    > (n0,n1,h), states exceeding the following values of n0 or
    > n1 are replaced with the state with the closest ratio
    > n0:n1 obtained by decrementing the smaller count:
    > (41,0,h), (40,1,h), (12,2,h), (5,3,h), (4,4,h), (3,5,h),
    > (2,12,h), (1,40,h), (0,41,h). For example: (12,2,1) 0->
    > (7,1,0) because there is no state (13,2,0).

    Its somewhat close to the idea I have about this
    quantization, and I think that idea with randomized state
    selection is really cool.
    But I also think that such an implementation is very
    limited, so here's an alternative.

    1. Counter meaning

    Ideally, counter should contain the whole context history,
    though that's generally impossible.
    Still, within such an ideal implementation it would be
    reasonable to store the context histories in compressed
    form, as they're commonly very redundant.

    2. Lossy compression of context histories

    Obviously, even in compressed form whole context histories
    are troublesome, as its much more convenient to have
    fixed-length counters.
    And to get that we have to lose some information.
    Now here's the point: I think that its better to lose
    what we least need instead of some stuff which gets
    "unlucky" due to specific update formula used.
    And to do that, possibly, we'd need some mapping
    from given history string to one symbol shorter
    string (uncompressed history!) - removing one
    symbol might not be always enough to get the
    history compressed length under limit, but I think
    that simply repeating such reduction won't be much of
    precision loss.
    As to reduction result, of course it has to be
    a string with statistical properties closest to
    the original one. "Perfect" algorithm would be
    to compress some sample with a model initialized
    by each possible string, and find the one with
    minimal output size (original history can be
    the sample). Guess that's too perfect, but
    with reasonable counter size limits (like 16-24 bits)
    it would be possible to precalculate the mappings.

    3. Conclusion

    If we'd want to compress a counter into a single byte
    (I'm not completely sure thats the right thing to do),
    then the method explained above is still fully
    applicable, but it might be useful to consider working
    with intervals instead of fixed values. Or randomly
    extended fixed values to make it paq-like .
    Anyway, I think that such a entropy-based approach
    would be much more sensible than "ad hoc" decrements
    and other such stuff.

  12. #12
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Some test on my laptop (Core2 Duo 2.2GHz, 2GB RAM, Vista Business x64 SP1)
    Code:
    BOOK1 (768,771 bytes)
    M1 -> 245,848 bytes @ 1452 kb/s (0.517 seconds)
    FLPAQ1 -> 207,209 bytes @ 341 kb/s (2.200 seconds)
    M1 2A -> 240,844 bytes @ 1358 kb/s (0.553 seconds)
    M1 2B -> 238,236 bytes @ 1258 kb/s (0.597 seconds)
    
    fp.log (20,617,071 bytes)
    M1 -> 875.373 bytes @ 2715 kb/s (7.415 seconds)
    FLPAQ1 -> 513,074 bytes @ 495 kb/s (40.700 seconds)
    M1 2A -> 710,253 bytes @ 2597 kb/s (7.752 seconds)
    M1 2B -> 655,256 bytes @ 2175 kb/s (9.258 seconds)
    
    world95.txt (2,988,578 bytes)
    M1 -> 727,370 bytes @ 2225 kb/s (1.312 seconds)
    FLPAQ1 -> 494,829 @ 389 kb/s (7.500 seconds)
    M1 2A -> 701,940 @ 1935 kb/s (1.508 seconds)
    M1 2B -> 686,673 @ 1627 kb/s (1.794 seconds)
    
    ENWIK8 (100000000 bytes)
    M1 -> 28,700,684 bytes @ 3094 kb/s (31.365 seconds)
    FLPAQ1 -> 22,481,907 bytes @ 397 kb/s (245.705 seconds)
    M1 2A -> 27,732,397 bytes @ 2162 kb/s (45.317 seconds)
    M1 2B ->  27,239,306 bytes @ 1770 kb/s (55.167 seconds)
    
    LARGE.TXT (41,664,000 bytes)
    M1 -> 55,078 bytes @ 5569 kb/s (7.306 seconds)
    FLPAQ1 -> 14,701 bytes @ 621 kb/s (65.561 Seconds)
    M1 2A -> 36,779 bytes @ 3478 kb/s (11.700 seconds)
    M1 2B -> 96,020 bytes @ 2753 kb/s (14.777 seconds)
    Seems context merging must have. Compression improved. I wonder the speed when you use FSM instead of counters.

    According to LARGE.TXT, it seems delayed counters really take some time. Also, another interesting thing is PAQ mixer becomes redundant on highly redundant data (like LARGE.TXT). That can be the answer why my compressor is not good on highly redundant data. Because, I have PAQ like mixer. Of course, with optimized parameters.

  13. #13
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    There's a MinGW bug which causes the decompression to fail - i didn't test the Win32 compiles. Wonder why nobody noticed that. I'll upload a fix tomorrow.

  14. #14
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hey!

    I've fixed 0.1 resulting in 0.1a and verified decompression for some files. The included binaries are renamend:

    *_paq.exe - paq's NN
    *_lin.exe - my linear mixer

    http://freenet-homepage.de/toffer_86/m1_0.1a_080917.7z

    In addition there's 0.1b which uses collision handling instead of simply ignoring them. Looks like the delayed counters don't like it, guess that's due to the large quantisation range of 16 bit per counter (compared to just 8 for PAQ's states) - somehow the whole layout looks a bit redundant to me.
    Shelwien ran my optimizer on both, book1 and enwik1 (the first mb of enwik, if you compile see "config.h" to select a parameter set. There's not that much difference since both files are mostly text. The contained binaries are compiled for book1.

    http://freenet-homepage.de/toffer_86/m1_0.1b_080917.7z

    I'm still working on optimized states a la PAQ.

    Did anyone try to optimize that stuff for x86 data or tested other text files with the supplied binary? I'd like to get some more stuff for future comparisions.
    Last edited by toffer; 17th September 2008 at 19:46.

  15. #15
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts
    binary file:
    401165 0.55 bm1.m1_paq_r
    404328 0.59 bm1.m1_b_r
    405330 0.52 bm1.m1_lin_r
    524288 bm1.raw

    enwik7:
    2793227 8.27 enwik7.m1_paq_r
    2841478 7.63 enwik7.m1_lin_r
    2862412 8.00 enwik7.m1_b_r
    10000000 enwik7

    enwik8:
    27239306 106.64 enwik8.m1_paq_r
    27732397 79.33 enwik8.m1_lin_r
    28200678 79.89 enwik8.m1_b_r
    100000000 enwik8

    timings may be inacurate(antivirus firewall ...)
    Last edited by chornobyl; 17th September 2008 at 20:52.

  16. #16
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    403
    Thanks
    154
    Thanked 232 Times in 125 Posts
    Quote Originally Posted by toffer View Post
    Did anyone try to optimize that stuff for x86 data or tested other text files with the supplied binary? I'd like to get some more stuff for future comparisions.
    How many times should it run?
    KZo


  17. #17
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Just view the output from time to time. When it nearly converged it only improves slightly (few bytes) - i usually stop it after ~1000 iterations. But things vary dependent on the test data, you simply have to try.

    You can quit an optimization process and resume it later on (if population.txt is present in the current working directory).

    If you run multiple optimizers in parallel make sure to use different working directories - otherwise the log files interfer.

    Thanks in advance

  18. #18
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    For genetic optimization, only some stopping criteria are valid - like diversity, generation count etc. So, basically there is no certain rule for stopping it. Just run it until you get no improvement. For toffer implementation generally ~1000 iterations are sufficient. But, you may run it until ~5000 iterations like I do For gathering optimized parameters on a data set, run it like that:

    m1 o file1 file2 file3 ...

    For example, for optimizing shelwien's well used optimization target:

    m1 o BOOK1 WCC386.EXE

    You will get optimized parameter by running it in optimizer mode. Then, you can recompile the source by gathered parameters. Else, gathered parameters will not changed on compression mode. Since, optimizer doesn't patch itself.

  19. #19
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    403
    Thanks
    154
    Thanked 232 Times in 125 Posts
    I got 565 iterations. There was power failure on PC. No UPS.
    File tested was world95.txt.
    Hope its good.
    Attached Files Attached Files
    KZo


  20. #20
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Seems good. Context masks are interesting. It discards high bits within the byte (basically it groups both lower case and upper case characters like manually PAQ does). With it's score, in maximumcompression.com, it have 127th rank right after BIT 0.1 and right before SR3.

    Code:
      1 - WinRK 3.0.3 (PWCM 912MB Dict) -> 330115 bytes
      2 - PAQAR 4.5 (-8)                -> 333759 bytes
      3 - PAQ8O10 (-7)                  -> 358414 bytes
    ...
     11 - LPAQ8 (7)                     -> 427349 bytes
    ...
     31 - CMM4 0.1e (26)                -> 454916 bytes
    ...
     43 - CCM 1.30c (CMMx -6)           -> 476577 bytes
    ...
     95 - BZIP2 1.0.5 (-9)              -> 577006 bytes
    ...
    105 - CABARC 1.00.0106 (-m LZX:21)  -> 603580 bytes
    ...
    126 - BIT 0.1                       -> 657433 bytes
    127 - M1                            -> 666297 bytes
    128 - SR3a                          -> 727977 bytes
    ...
    137 - SLUG 1.27b                    -> 788981 bytes
    ...
    145 - THOR 0.96 (e5)                -> 824244 bytes
    ...
    157 - PKZIP 2.50 (-exx)             -> 860478 bytes
    ...
    183 - LZOP 1.02rc1 (-9)             -> 982968 bytes

  21. #21
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for testing, i'll included that parameter set.

    I'd like to get a parameter set for general purpose compression - so one can get meaningful SFC results, you might use book1+wcc386.exe as a target.

  22. #22
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    403
    Thanks
    154
    Thanked 232 Times in 125 Posts

    Post

    book1+wcc386.exe

    I got 1823 iterations, but last entry is at 935.
    Attached Files Attached Files
    KZo


  23. #23
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Logs for various files.
    Attached Files Attached Files

  24. #24
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Some more logs.
    Attached Files Attached Files

  25. #25
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for your support!

    I've just completely rewritten some stuff and successfully found a representation for a quantization mapping - which was a really hard problem. Now i'll continue to probe for a good representation of FSM counters.

    EDIT - could you please upload some of your test files. Somehow some results look weird to me.
    Last edited by toffer; 24th September 2008 at 13:49.

  26. #26
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by toffer View Post
    Thanks for your support!

    I've just completely rewritten some stuff and successfully found a representation for a quantization mapping - which was a really hard problem. Now i'll continue to probe for a good representation of FSM counters.

    EDIT - could you please upload some of your test files. Somehow some results look weird to me.
    Not all.
    What do you need?

  27. #27
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    On some of you files the context mask of one model looks weird - but maybe valid. It consists of a large gap with bits randomly set, at least it looks random to me:

    ftpserver.hlp -> MDL1_MASK = 00100000011100010111001100000011110010011111011010 01011111110111 (20717303c9f697f7)

    messageinfo.dat -> MDL0_MASK = 11011100000101001011111000000100000011110000000001 01111100010100 (dc14be040f005f14)

    Did you test against already compressed files?
    Last edited by toffer; 24th September 2008 at 14:13.

  28. #28
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by toffer View Post
    On some of you files the context mask of one model looks weird - but maybe valid. It consists of a large gap with bits randomly set, at least it looks random to me.
    You mean that I ran too few cycles? Possibly, the recently added FTPServer.hlp.txt needed way more than 1000 iterations, probably 1500-2000 (paused and resumed too many times to get the exact number) and I used ~1000 before, on one (don't remember which one) file that didn't get any improvement during 160 cycles I finished at ~880. Later I learned that even after 250 a lot can happen...

    Tell me which are the files and you'll likely get them.

    EDIT: OK, .hlp is no problem, but the .dat is my message box, so: sorry, I won't publish it.

    EDIT2:
    Compressed:
    In the multitest there was a .png. .pem is a text encoded encrypted content.
    .hlp is compressed
    Thumbs.db contains images, I guess they are small .jpgs
    Attached Files Attached Files
    Last edited by m^2; 24th September 2008 at 14:25.

  29. #29
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I took a look into the message box.
    Nothing compressed. A lot of text with small binary intrusions, several blocks of zeros, some other binary things, probably headers, don't look very random.
    The original size is 1163264 B, even RAR -m1 compresses it to ~25%.

  30. #30
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi again!

    I'll be away for a few hours - but i've just successfully tested my rewritten optimizer. If anybody could run this:

    http://freenet-homepage.de/toffer_86/m1.exe

    on enwik1 from:

    http://freenet-homepage.de/toffer_86/tuning.7z

    like this:

    m1 o enwik1

    Note that here i haven't implemented resuming functionality, so please run until the optimizer stops. The optimization process is split into two passes:

    1. a genetic algorithm;
    2. after 100 iterations w/o an improvement a simple hill climber is started and stops if it hit a local minimum.

    I'll keep it running as long as i can on my own computer while i'm away.

Page 1 of 7 123 ... LastLast

Similar Threads

  1. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 27th January 2010, 04:12
  2. How fast should be a range coder ?
    By Cyan in forum Data Compression
    Replies: 33
    Last Post: 16th November 2009, 17:02
  3. pbzip2 1.05 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 2nd October 2009, 17:21
  4. Comparison of the recent CM demo coders
    By Shelwien in forum Data Compression
    Replies: 38
    Last Post: 13th June 2008, 14:21
  5. QUAD-SFX DEMO
    By encode in forum Forum Archive
    Replies: 17
    Last Post: 26th April 2007, 14:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •