Page 23 of 29 FirstFirst ... 132122232425 ... LastLast
Results 661 to 690 of 849

Thread: Tree alpha v0.1 download

  1. #661
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Error about exceeding the symbol limit!
    You have to consider that it has used all the memory on my PC (6 GB)

  2. #662
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  3. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Kennon Conrad (28th May 2015)

  4. #663
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Thanks. I hope you didn't mind the 17 months it took me to start implementing your suggestion. There were a lot of things I needed to learn along the way....

  5. The Following User Says Thank You to Kennon Conrad For This Useful Post:

    surfersat (28th May 2015)

  6. #664
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.2b

    Quote Originally Posted by Nania Francesco View Post
    Error about exceeding the symbol limit!
    You have to consider that it has used all the memory on my PC (6 GB)
    RAM usage shouldn't be a problem for encoding, assuming you made it through compression. Here's a version with a symbol limit of 1 trillion symbols. Also, decoding of low compressibility "binary" files is about 30% faster than v0.2, but still significantly slower than v0.1. For instance, FLashMX.pdf was 124 msec to decode with v0.1, 468 msec to decode with v0.2, and 312 msec to decode with v0.2b.
    Attached Files Attached Files

  7. The Following User Says Thank You to Kennon Conrad For This Useful Post:

    Nania Francesco (28th May 2015)

  8. #665
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I have to wait till tonight to test it because it takes a long time!

  9. #666
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I think you should think to increase (10x) the speed training of tree and then the compression to be able to be considered a useful compressor. In decompression is excellent!

  10. #667
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Nania Francesco View Post
    I think you should think to increase (10x) the speed training of tree and then the compression to be able to be considered a useful compressor. In decompression is excellent!
    Yes, I think about faster compression methods and agree, but it won't be easy.

  11. #668
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I saw the code and it is quite complicated. I think the right method to speed up is to take your idea and start from the beginning with a new similar. For example, you might consider compressing packet 16 MB or 32 MB at a time.

  12. #669
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Is this happening with one of your log files?
    No, it's a 967,021,395 bytes (private) HTML file, like enwik9 but different source without binary data.
    I did not saved console output, decode crashed (very quick) and I thought no console output.

  13. #670
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Nania Francesco View Post
    I saw the code and it is quite complicated. I think the right method to speed up is to take your idea and start from the beginning with a new similar. For example, you might consider compressing packet 16 MB or 32 MB at a time.
    Yes, the code is complicated. Probably the only parts that would be useful for a couple of approaches I am considering are the input and output functions. The rest would have to be new.

  14. #671
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    No, it's a 967,021,395 bytes (private) HTML file, like enwik9 but different source without binary data.
    I did not saved console output, decode crashed (very quick) and I thought no console output.
    It's hard to debug problems with files that can't be shared, but I am willing to try if you want. The first step would be to try GLZAcompress v0.2b because the token substitution bug could have been the cause. If that doesn't work, I can create a debug version with some checks I don't have in the release code.

    Edit: A few other things that could provide good debug information:
    1. If the o0e value printed to the console during compression suddenly drastically increases, that indicates the problem is in the compressor.
    2. You could try the output of GLZAcompress with TreeEncode and TreeDecode to see if it's a problem with the new encoding.
    3. The console output of GLZAencode would be helpful.
    Last edited by Kennon Conrad; 30th May 2015 at 18:08.

  15. #672
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    try GLZAcompress v0.2b
    Output log file:

    GLZAformat html.txt html.txt.glzf
    Pre encoding 967021395 bytes
    Wrote 1 byte header and 967021395 encoded bytes in 3.500 seconds.

    GLZAcompress html.txt.glzf html.txt.glzc

    GLZAencode html.txt.glzc html.txt.glze

    GLZAdecode html.txt.glze html.txt.glzd


    Output console buffer:

    Read 74360641 of 74360641 symbols, start 0.0000
    Common prefix scan 0 - 4adfad, score[22512] = 171.77
    593: 74297332 syms, dict. size 4924044, 15.4644 bits/sym, o0e 143620830 bytes
    Read 74297332 of 74297332 symbols, start 0.0000
    Common prefix scan 0 - 4b238b, score[22667] = 167.77
    594: 74224535 syms, dict. size 4941599, 15.4809 bits/sym, o0e 143632742 bytes
    Read 74224535 of 74224535 symbols, start 0.0000
    Common prefix scan 0 - 4b681e, score[22828] = 163.91
    595: 74146662 syms, dict. size 4959179, 15.4983 bits/sym, o0e 143643181 bytes
    Read 74146662 of 74146662 symbols, start 0.0000
    Common prefix scan 0 - 4bacca, score[22896] = 160.20
    596: 74062967 syms, dict. size 4976974, 15.5170 bits/sym, o0e 143654152 bytes
    Read 74062967 of 74062967 symbols, start 0.0000
    Common prefix scan 0 - 4bf24d, score[23049] = 156.38
    597: 73984863 syms, dict. size 4995034, 15.5346 bits/sym, o0e 143665212 bytes
    Read 73984863 of 73984863 symbols, start 0.0000
    Common prefix scan 0 - 4c38d9, score[23260] = 152.61
    598: 73904811 syms, dict. size 5013342, 15.5526 bits/sym, o0e 143676714 bytes
    Read 73904811 of 73904811 symbols, start 0.0000
    Common prefix scan 0 - 4c805d, score[23480] = 148.80
    599: 73817237 syms, dict. size 5032032, 15.5723 bits/sym, o0e 143687885 bytes
    Read 73817237 of 73817237 symbols, start 0.0000
    Common prefix scan 0 - 4cc95f, score[23784] = 145.02
    600: 73746050 syms, dict. size 5051270, 15.5889 bits/sym, o0e 143702530 bytes
    Read 73746050 of 73746050 symbols, start 0.0000
    Common prefix scan 0 - 4d1485, score[24217] = 141.12
    601: 73670350 syms, dict. size 5070945, 15.6067 bits/sym, o0e 143718598 bytes
    Read 73670350 of 73670350 symbols, start 0.0000
    Common prefix scan 0 - 4d6160, score[24625] = 137.33
    602: 73598748 syms, dict. size 5091011, 15.6238 bits/sym, o0e 143736131 bytes
    Read 73598748 of 73598748 symbols, start 0.0000
    Common prefix scan 0 - 4dafc2, score[24997] = 133.45
    603: 73522836 syms, dict. size 5111391, 15.6417 bits/sym, o0e 143753179 bytes
    Read 73522836 of 73522836 symbols, start 0.0000
    Common prefix scan 0 - 4dff5e, score[25311] = 129.73
    604: 73443629 syms, dict. size 5132039, 15.6605 bits/sym, o0e 143770558 bytes
    Read 73443629 of 73443629 symbols, start 0.0000
    Common prefix scan 0 - 4e5006, score[25578] = 125.99
    605: 73368466 syms, dict. size 5152884, 15.6787 bits/sym, o0e 143789942 bytes
    Read 73368466 of 73368466 symbols, start 0.0000
    Common prefix scan 0 - 4ea173, score[25786] = 122.29
    606: 73287865 syms, dict. size 5173848, 15.6980 bits/sym, o0e 143809353 bytes
    Read 73287865 of 73287865 symbols, start 0.0000
    Common prefix scan 0 - 4ef357, score[25927] = 118.67
    607: 73208738 syms, dict. size 5194966, 15.7171 bits/sym, o0e 143828923 bytes
    Read 73208738 of 73208738 symbols, start 0.0000
    Common prefix scan 0 - 4f45d5, score[26067] = 115.06
    608: 73122788 syms, dict. size 5216215, 15.7376 bits/sym, o0e 143846964 bytes
    Read 73122788 of 73122788 symbols, start 0.0000
    Common prefix scan 0 - 4f98d6, score[26193] = 111.65
    609: 73035675 syms, dict. size 5237666, 15.7585 bits/sym, o0e 143866576 bytes
    Read 73035675 of 73035675 symbols, start 0.0000
    Common prefix scan 0 - 4feca1, score[26357] = 108.35
    610: 72954348 syms, dict. size 5259211, 15.7784 bits/sym, o0e 143887514 bytes
    Read 72954348 of 72954348 symbols, start 0.0000
    Common prefix scan 0 - 5040ca, score[26469] = 104.97
    611: 72872885 syms, dict. size 5280823, 15.7982 bits/sym, o0e 143907877 bytes
    Read 72872885 of 72872885 symbols, start 0.0000
    Common prefix scan 0 - 509536, score[26546] = 101.61
    612: 72779731 syms, dict. size 5302543, 15.8207 bits/sym, o0e 143928158 bytes
    Read 72779731 of 72779731 symbols, start 0.0000
    Common prefix scan 0 - 50ea0e, score[26637] = 98.53
    613: 72693165 syms, dict. size 5324528, 15.8418 bits/sym, o0e 143948978 bytes
    Read 72693165 of 72693165 symbols, start 0.0000
    Common prefix scan 0 - 513fef, score[26827] = 95.35
    614: 72607424 syms, dict. size 5346880, 15.8630 bits/sym, o0e 143971418 bytes
    Read 72607424 of 72607424 symbols, start 0.0000
    Common prefix scan 0 - 51973f, score[27113] = 91.96
    615: 72527654 syms, dict. size 5369593, 15.8832 bits/sym, o0e 143996303 bytes
    Read 72527654 of 72527654 symbols, start 0.0000
    Common prefix scan 0 - 51eff8, score[27426] = 88.84
    616: 72439036 syms, dict. size 5392453, 15.9053 bits/sym, o0e 144020585 bytes
    Read 72439036 of 72439036 symbols, start 0.0000
    Common prefix scan 0 - 524944, score[27619] = 85.77
    617: 72350131 syms, dict. size 5415470, 15.9276 bits/sym, o0e 144045232 bytes
    Read 72350131 of 72350131 symbols, start 0.0000
    Common prefix scan 0 - 52a32d, score[27779] = 82.98
    618: 72269742 syms, dict. size 5438729, 15.9484 bits/sym, o0e 144072903 bytes
    Read 72269742 of 72269742 symbols, start 0.0000
    Common prefix scan 0 - 52fe08, score[27978] = 79.91
    619: 72186350 syms, dict. size 5462220, 15.9697 bits/sym, o0e 144099658 bytes
    Read 72186350 of 72186350 symbols, start 0.0000
    Common prefix scan 0 - 5359cb, score[28184] = 76.94
    620: 72105422 syms, dict. size 5486000, 15.9908 bits/sym, o0e 144128001 bytes
    Read 72105422 of 72105422 symbols, start 0.0000
    Common prefix scan 0 - 53b6af, score[28428] = 74.02
    621: 72015979 syms, dict. size 5509788, 16.0139 bits/sym, o0e 144157437 bytes
    Read 72015979 of 72015979 symbols, start 0.0000
    Common prefix scan 0 - 54139b, score[28514] = 71.29
    622: 71932753 syms, dict. size 5533462, 16.0357 bits/sym, o0e 144186777 bytes
    Read 71932753 of 71932753 symbols, start 0.0000
    Common prefix scan 0 - 547015, score[28474] = 68.50
    623: 71848140 syms, dict. size 5557126, 16.0579 bits/sym, o0e 144216453 bytes
    Read 71848140 of 71848140 symbols, start 0.0000
    Common prefix scan 0 - 54cc85, score[28454] = 65.88
    624: 71758864 syms, dict. size 5580726, 16.0811 bits/sym, o0e 144245477 bytes
    Read 71758864 of 71758864 symbols, start 0.0000
    Common prefix scan 0 - 5528b5, score[28409] = 63.47
    625: 71677469 syms, dict. size 5604352, 16.1029 bits/sym, o0e 144276458 bytes
    Read 71677469 of 71677469 symbols, start 0.0000
    Common prefix scan 0 - 5584ff, score[28410] = 60.94
    626: 71592031 syms, dict. size 5628204, 16.1255 bits/sym, o0e 144306768 bytes
    Read 71592031 of 71592031 symbols, start 0.0000
    Common prefix scan 0 - 55e22b, score[28546] = 58.46
    627: 71513834 syms, dict. size 5652205, 16.1468 bits/sym, o0e 144340139 bytes
    Read 71513834 of 71513834 symbols, start 0.0000
    Common prefix scan 0 - 563fec, score[28682] = 56.03
    628: 71432080 syms, dict. size 5676261, 16.1690 bits/sym, o0e 144373008 bytes
    Read 71432080 of 71432080 symbols, start 0.0000
    Common prefix scan 0 - 569de4, score[28760] = 53.83
    629: 71348583 syms, dict. size 5700453, 16.1917 bits/sym, o0e 144407287 bytes
    Read 71348583 of 71348583 symbols, start 0.0000
    Common prefix scan 0 - 56fc64, score[28869] = 51.66
    630: 71266791 syms, dict. size 5724783, 16.2142 bits/sym, o0e 144441493 bytes
    Read 71266791 of 71266791 symbols, start 0.0000
    Common prefix scan 0 - 575b6e, score[28989] = 49.37
    631: 71187073 syms, dict. size 5749283, 16.2365 bits/sym, o0e 144478174 bytes
    Read 71187073 of 71187073 symbols, start 0.0000
    Common prefix scan 0 - 57bb22, score[29131] = 47.21
    632: 71101925 syms, dict. size 5774028, 16.2599 bits/sym, o0e 144513663 bytes
    Read 71101925 of 71101925 symbols, start 0.0000
    Common prefix scan 0 - 581bcb, score[29327] = 45.07
    633: 71022895 syms, dict. size 5799131, 16.2824 bits/sym, o0e 144552590 bytes
    Read 71022895 of 71022895 symbols, start 0.0000
    Common prefix scan 0 - 587dda, score[29608] = 43.02
    634: 70944509 syms, dict. size 5824455, 16.3048 bits/sym, o0e 144591906 bytes
    Read 70944509 of 70944509 symbols, start 0.0000
    Common prefix scan 0 - 58e0c6, score[29835] = 41.06
    635: 70865508 syms, dict. size 5849906, 16.3274 bits/sym, o0e 144631488 bytes
    Read 70865508 of 70865508 symbols, start 0.0000
    Common prefix scan 0 - 594431, score[29987] = 39.19
    636: 70786581 syms, dict. size 5875600, 16.3503 bits/sym, o0e 144673040 bytes
    Read 70786581 of 70786581 symbols, start 0.0000
    Common prefix scan 0 - 59a88f, score[30000] = 37.31
    637: 70702884 syms, dict. size 5901386, 16.3742 bits/sym, o0e 144713314 bytes
    Read 70702884 of 70702884 symbols, start 0.0000
    Common prefix scan 0 - 5a0d49, score[30000] = 35.46
    638: 70622854 syms, dict. size 5927365, 16.3976 bits/sym, o0e 144755850 bytes
    Read 70622854 of 70622854 symbols, start 0.0000
    Common prefix scan 0 - 5a72c4, score[30000] = 33.65
    639: 70539473 syms, dict. size 5953329, 16.4220 bits/sym, o0e 144799542 bytes
    Read 70539473 of 70539473 symbols, start 0.0000
    Common prefix scan 0 - 5ad830, score[30000] = 32.02
    640: 70451498 syms, dict. size 5979557, 16.4472 bits/sym, o0e 144841429 bytes
    Read 70451498 of 70451498 symbols, start 0.0000
    Common prefix scan 0 - 5b3ea4, score[30000] = 30.37
    641: 70368665 syms, dict. size 6005878, 16.4718 bits/sym, o0e 144887138 bytes
    Read 70368665 of 70368665 symbols, start 0.0000
    Common prefix scan 0 - 5ba575, score[30000] = 28.72
    642: 70286210 syms, dict. size 6032276, 16.4963 bits/sym, o0e 144932781 bytes
    Read 70286210 of 70286210 symbols, start 0.0000
    Common prefix scan 0 - 5c0c93, score[30000] = 27.28
    643: 70202697 syms, dict. size 6058782, 16.5213 bits/sym, o0e 144979578 bytes
    Read 70202697 of 70202697 symbols, start 0.0000
    Common prefix scan 0 - 5c741d, score[30000] = 25.84
    644: 70109093 syms, dict. size 6085437, 16.5485 bits/sym, o0e 145024880 bytes
    Read 70109093 of 70109093 symbols, start 0.0000
    Common prefix scan 0 - 5cdc3c, score[30000] = 24.34
    645: 70024519 syms, dict. size 6112373, 16.5740 bits/sym, o0e 145073718 bytes
    Read 70024519 of 70024519 symbols, start 0.0000
    Common prefix scan 0 - 5d4574, score[30000] = 23.04
    646: 69936945 syms, dict. size 6139562, 16.6004 bits/sym, o0e 145122838 bytes
    Read 69936945 of 69936945 symbols, start 0.0000
    Common prefix scan 0 - 5dafa9, score[30000] = 21.65
    647: 69848419 syms, dict. size 6167001, 16.6273 bits/sym, o0e 145173893 bytes
    Read 69848419 of 69848419 symbols, start 0.0000
    Common prefix scan 0 - 5e1ad8, score[30000] = 20.37
    648: 69757611 syms, dict. size 6194747, 16.6547 bits/sym, o0e 145224389 bytes
    Read 69757611 of 69757611 symbols, start 0.0000
    Common prefix scan 0 - 5e873a, score[30000] = 19.20
    649: 69662594 syms, dict. size 6222737, 16.6835 bits/sym, o0e 145277399 bytes
    Read 69662594 of 69662594 symbols, start 0.0000
    Common prefix scan 0 - 5ef490, score[30000] = 17.92
    650: 69577509 syms, dict. size 6250780, 16.7103 bits/sym, o0e 145332867 bytes
    Read 69577509 of 69577509 symbols, start 0.0000
    Common prefix scan 0 - 5f621b, score[30000] = 16.71
    651: 69490453 syms, dict. size 6278559, 16.7375 bits/sym, o0e 145386832 bytes
    Read 69490453 of 69490453 symbols, start 0.0000
    Common prefix scan 0 - 5fce9e, score[30000] = 15.75
    652: 69413323 syms, dict. size 6306411, 16.7624 bits/sym, o0e 145442047 bytes
    Read 69413323 of 69413323 symbols, start 0.0000
    Common prefix scan 0 - 603b6a, score[29268] = 14.70
    653: 69330001 syms, dict. size 6333220, 16.7888 bits/sym, o0e 145495622 bytes
    Read 69330001 of 69330001 symbols, start 0.0000
    Common prefix scan 0 - 60a423, score[30000] = 13.64
    654: 69248064 syms, dict. size 6360681, 16.8153 bits/sym, o0e 145553379 bytes
    Read 69248064 of 69248064 symbols, start 0.0000
    Common prefix scan 0 - 610f68, score[30000] = 12.66
    655: 69155011 syms, dict. size 6388140, 16.8445 bits/sym, o0e 145610113 bytes
    Read 69155011 of 69155011 symbols, start 0.0000
    Common prefix scan 0 - 617aab, score[30000] = 11.74
    656: 69073475 syms, dict. size 6415655, 16.8710 bits/sym, o0e 145666949 bytes
    Read 69073475 of 69073475 symbols, start 0.0000
    Common prefix scan 0 - 61e626, score[30000] = 10.79
    657: 68981957 syms, dict. size 6443042, 16.9002 bits/sym, o0e 145725816 bytes
    Read 68981957 of 68981957 symbols, start 0.0000
    Common prefix scan 0 - 625121, score[30000] = 9.93
    658: 68896757 syms, dict. size 6470335, 16.9279 bits/sym, o0e 145784656 bytes
    Read 68896757 of 68896757 symbols, start 0.0000
    Common prefix scan 0 - 62bbbe, score[30000] = 9.38
    659: 68822024 syms, dict. size 6497968, 16.9532 bits/sym, o0e 145844024 bytes
    Read 68822024 of 68822024 symbols, start 0.0000
    Common prefix scan 0 - 6327af, score[30000] = 8.74
    660: 68752218 syms, dict. size 6525650, 16.9778 bits/sym, o0e 145907691 bytes
    Read 68752218 of 68752218 symbols, start 0.0000
    Common prefix scan 0 - 6393d1, score[25920] = 8.04
    661: 68681741 syms, dict. size 6549289, 17.0013 bits/sym, o0e 145960241 bytes
    Read 68681741 of 68681741 symbols, start 0.0000
    Common prefix scan 0 - 63f028, score[28948] = 7.42
    662: 68616895 syms, dict. size 6575855, 17.0245 bits/sym, o0e 146020719 bytes
    Read 68616895 of 68616895 symbols, start 0.0000
    Common prefix scan 0 - 6457ee, score[28056] = 6.75
    663: 68548198 syms, dict. size 6601410, 17.0484 bits/sym, o0e 146079778 bytes
    Read 68548198 of 68548198 symbols, start 0.0000
    Common prefix scan 0 - 64bbc1, score[30000] = 6.07
    664: 68477858 syms, dict. size 6628794, 17.0734 bits/sym, o0e 146144049 bytes
    Read 68477858 of 68477858 symbols, start 0.0000
    Common prefix scan 0 - 6526b9, score[30000] = 5.57
    665: 68410702 syms, dict. size 6656275, 17.0979 bits/sym, o0e 146210080 bytes
    Read 68410702 of 68410702 symbols, start 0.0000
    Common prefix scan 0 - 659212, score[27177] = 5.03
    666: 68338148 syms, dict. size 6680995, 17.1230 bits/sym, o0e 146269117 bytes
    Read 68338148 of 68338148 symbols, start 0.0000
    Common prefix scan 0 - 65f2a2, score[29601] = 4.45
    667: 68268431 syms, dict. size 6707815, 17.1481 bits/sym, o0e 146333886 bytes
    Read 68268431 of 68268431 symbols, start 0.0000
    Common prefix scan 0 - 665b66, score[30000] = 3.88
    668: 68190698 syms, dict. size 6734700, 17.1754 bits/sym, o0e 146400004 bytes
    Read 68190698 of 68190698 symbols, start 0.0000
    Common prefix scan 0 - 66c46b, score[30000] = 3.71
    669: 68144158 syms, dict. size 6762411, 17.1952 bits/sym, o0e 146469137 bytes
    Read 68144158 of 68144158 symbols, start 0.0000
    Common prefix scan 0 - 6730aa, score[18232] = 3.46
    670: 68088224 syms, dict. size 6778714, 17.2140 bits/sym, o0e 146508978 bytes
    Read 68088224 of 68088224 symbols, start 0.0000
    Common prefix scan 0 - 677059, score[24516] = 3.04
    671: 68031176 syms, dict. size 6801433, 17.2352 bits/sym, o0e 146566059 bytes
    Read 68031176 of 68031176 symbols, start 0.0000
    Common prefix scan 0 - 67c918, score[26564] = 2.67
    672: 67982489 syms, dict. size 6826369, 17.2551 bits/sym, o0e 146630953 bytes
    Read 67982489 of 67982489 symbols, start 0.0000
    Common prefix scan 0 - 682a80, score[28586] = 2.40
    673: 67937085 syms, dict. size 6853128, 17.2749 bits/sym, o0e 146700818 bytes
    Read 67937085 of 67937085 symbols, start 0.0000
    Common prefix scan 0 - 689307, score[21396] = 2.09
    674: 67880209 syms, dict. size 6872756, 17.2954 bits/sym, o0e 146751744 bytes
    Read 67880209 of 67880209 symbols, start 0.0000
    Common prefix scan 0 - 68dfb3, score[26524] = 1.73
    675: 67817300 syms, dict. size 6897339, 17.3191 bits/sym, o0e 146816526 bytes
    Read 67817300 of 67817300 symbols, start 0.0000
    Common prefix scan 0 - 693fba, score[28360] = 1.48
    676: 67765128 syms, dict. size 6923955, 17.3409 bits/sym, o0e 146888191 bytes
    Read 67765128 of 67765128 symbols, start 0.0000
    Common prefix scan 0 - 69a7b2, score[24120] = 1.21
    677: 67708807 syms, dict. size 6946310, 17.3624 bits/sym, o0e 146948551 bytes
    Read 67708807 of 67708807 symbols, start 0.0000
    Common prefix scan 0 - 69ff05, score[28172] = 0.95
    678: 67650916 syms, dict. size 6972477, 17.3858 bits/sym, o0e 147020677 bytes
    Read 67650916 of 67650916 symbols, start 0.0000
    Common prefix scan 0 - 6a653c, score[29725] = 0.69
    679: 67584546 syms, dict. size 6999949, 17.4119 bits/sym, o0e 147097015 bytes
    Read 67584546 of 67584546 symbols, start 0.0000
    Common prefix scan 0 - 6ad08c, score[27608] = 0.50
    680: 67526033 syms, dict. size 7025370, 17.4355 bits/sym, o0e 147168775 bytes
    Read 67526033 of 67526033 symbols, start 0.0000
    Common prefix scan 0 - 6b33d9, score[6428] = 0.50
    681: 67501470 syms, dict. size 7030872, 17.4437 bits/sym, o0e 147184129 bytes
    Read 67501470 of 67501470 symbols, start 0.0000
    Common prefix scan 0 - 6b4957, score[2462] = 0.50
    682: 67487721 syms, dict. size 7032909, 17.4479 bits/sym, o0e 147189771 bytes
    Read 67487721 of 67487721 symbols, start 0.0000
    Common prefix scan 0 - 6b514c, score[1221] = 0.50
    683: 67479654 syms, dict. size 7033965, 17.4503 bits/sym, o0e 147192699 bytes
    Read 67479654 of 67479654 symbols, start 0.0000
    Common prefix scan 0 - 6b556c, score[715] = 0.50
    684: 67474056 syms, dict. size 7034611, 17.4520 bits/sym, o0e 147194506 bytes
    Read 67474056 of 67474056 symbols, start 0.0000
    Common prefix scan 0 - 6b57f2, score[331] = 0.50
    685: 67471332 syms, dict. size 7034919, 17.4528 bits/sym, o0e 147195352 bytes
    Read 67471332 of 67471332 symbols, start 0.0000
    Common prefix scan 0 - 6b5926, score[548] = 0.50
    686: 67468883 syms, dict. size 7035456, 17.4536 bits/sym, o0e 147196887 bytes
    Read 67468883 of 67468883 symbols, start 0.0000
    Common prefix scan 0 - 6b5b3f, score[92] = 0.50
    687: 67468092 syms, dict. size 7035548, 17.4538 bits/sym, o0e 147197141 bytes
    Read 67468092 of 67468092 symbols, start 0.0000
    Common prefix scan 0 - 6b5b9b, score[186] = 0.50
    688: 67467482 syms, dict. size 7035732, 17.4541 bits/sym, o0e 147197660 bytes
    Read 67467482 of 67467482 symbols, start 0.0000
    Common prefix scan 0 - 6b5c53, score[16] = 0.50
    689: 67467386 syms, dict. size 7035748, 17.4541 bits/sym, o0e 147197703 bytes
    Read 67467386 of 67467386 symbols, start 0.0000

    Run time 9170.299 seconds.
    cap encoded 0, UTF8 compliant 0
    Read 67467386 symbols including 7035748 definition symbols
    Encoded 39601494 level 1 symbols
    Compressed file size: 100893688 bytes, dictionary size: 6921160 symbols
    elapsed time = 24.610000 seconds.
    839399434

  16. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (31st May 2015)

  17. #673
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    Output log file:
    ...
    839399434
    This helps quite a bit in determining where the problem lies. Compression crashed nearly 90% of the way through the file. It looks like it is probably a decoding problem related to code calculations with a shrinking dictionary. If you could try this version of GLZAdecode and post the last few lines, I should be able to target some debug code right at the location of failure. It might take a few minutes to run, but as long as the console is updating occassionally it is okay.

    If you run it, you need a third command line argument, which is the name of the uncompressed file you started with. Also, the extra code is only in the default multi-threaded code so don't use the -t1 command line option.
    Attached Files Attached Files

  18. #674
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    you need a third command line argument, which is the name of the uncompressed file you started with.
    I hope I followed your instructions right (GLZAdecodeTest):

    glzadecode html.txt.glze html2.txt html.txt
    872311852
    Then crash.
    html2.txt = 874,670,976 bytes

    I tried 0.2b decode again to see if it still crash at the lower value:
    glzadecode html.txt.glze html2.txt
    839399434
    Then crash.
    html2.txt = 877,477,998 bytes
    Last edited by Sportman; 31st May 2015 at 02:03.

  19. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (31st May 2015)

  20. #675
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    I hope I followed your instructions right (GLZAdecodeTest):
    Yes, you did it right. No errors were detected in the output file prior to the crash, so that is a good sign that troubleshooting won't be too hard (hopefully). Can you try this version? I should have printed the symbol position instead of the output character position so there may be a fairly long burst of prints right before the crash and there is a small chance it won't print anything, but I think this will give good information (but probably one or more cycles need, but I will try to keep it a small number). I would really like to fix this. It looks like a file that is well suited for really good decompression characteristics with GLZA.

    The test method is the same, except I named the program "GLZAdecodeTest" instead of "GLZAdecode".
    Attached Files Attached Files

  21. #676
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Can you try this version?

    glzadecodetest html.txt.glze html2.txt html.txt:


    Start Print

    Start Print
    Main 36133531 low 9554040 range 497de0 type 0, low 55404000 range 2e3767e9
    GSS code length 5 shift 3 fc 20 symbol 1d
    Main 36133532 low 55ac1c0d range 35b9000 type 3, low 58aff615 range 57b5de

    Start Print

    Start Print
    Main 36133533 low dfc4a7e4 range 1418962 type 0, low dfc4a7e4 range c95d9c
    GSS code length 17 shift 9 fc 21 symbol a07c
    Main 36133534 low c99a2010 range b79670 type 1, low ca0dc914 range 19b3c8
    DDL 36133534 CL 24 INST 2
    DD 36133534 sid 2 low dcbdb07 range 8dbfa00 type 2, low 14d29847 range 15fed60
    DD 36133534 sid 1 low 1624fccb range 300c type 2, low 25229280 range 7cd60

    Start Print
    DD get code 2 inst
    DD ret
    Main 36133535 low 290319a4 range 5f6c4 type 0, low 319a400 range 3b856ae
    GLS FC 20 code length 19, index bits 4 reduce bits 0 symbol a28b1

    Start Print

    Start Print
    Main 36133536 low ac0d2ac0 range 1149620 type 0, low ac0d2ac0 range ad8b40
    GLS FC 7a code length 24, index bits 4 reduce bits 1 symbol 628849

    Start Print
    Main 36133537 low 85e58000 range 2738000 type 2, low 87cce0c0 range 491b50
    Main 36133538 low 880ec896 range 1de2 type 2, low df929d40 range 3bc3ff4

    Start Print
    Main 36133539 low e24f1180 range 875c8
    Start Print
    type 0, low 4f118000 range 53cb8a1
    GSS code length 10 shift 4 fc 31 symbol ed7
    Main 36133540 low 4fa19bc4 range 6c8c0 type 0, low a19bc400 range 4395f54
    GSS code length 21 shift 10 fc 23 symbol 273ba4

    Start Print
    Main 36133541 low 5c5419cc range 94fd6 type 0, low 5419cc00 range 5d4aea6
    GSS code length 9 shift 6 fc 36 symbol 281

    Start Print
    Main 36133542 low 55af8f00 range 49c3200 type 2, low 593a8f80 range 98fd00

    Start Print
    Main 36133543 low b6a803ff range ab2295 type 0, low b6a803ff range 6ac338
    GSS code length 10 shift 4 fc 31
    Start Print
    symbol ed7
    Main 36133544 low b4938400 range 438480 type 1, low bdde1836 range 934fa22

    Start Print
    DDL 36133544 CL 24 INST 5
    DD 36133544 sid 3 low c54f7de4 range e38b9 type 1, low 59222d40 range 1723e30
    DDL 36133544 CL 23 INST 7
    DD 36133544 sid 2 low 59ec2008 range 34998 type 3, low ef3f30bc range 2a6ee8
    DD 36133544 sid 1 low 56c5b191 range 6364f5 type 0, low 56c5b191 range 424320
    GLS FC 3c code length 24, index bits 8 reduce bits 1 symbol 3ef5d2
    DD get code 7 inst
    DD ret
    DD 36133544 sid 2 low 20f22694 range 407f4 type 0, low f2269400 range 2b2cfae
    GLS FC 20 code length 23, index bits 8 reduce bits 0 symbol 33c7a1
    DD 36133544 sid 1 low 62c8259e range 1952f2 type 0, low c8259e00 range 1105637c
    GSS code length 9 shift 6 fc 37 symbol 6e
    DD get code 5 inst
    DD ret
    Main 36133545 low 2b22b700 range 121d9800 type 0, low 2b22b700 range b42d40e

    Start Print
    GSS code length 12 shift 6 fc 37 symbol 295
    Main 36133546 low 9e6a5700 range 1919f40 type 2, low 9f9eba54 range 35c9e2

    Start Print

    Start Print
    Main 36133547 low c6fa8935 range 5f5e5b type 0, low c6fa8935 range 3b13d4
    GSS code length 16 shift 6 fc 37 symbol 106e5
    Main 36133548 low cc2d464 range 7efc type 1, low c3237a18 range 11d280

    Start Print
    DDL 36133548 CL 23 INST 12
    DD 36133548 sid 2 low 2dbb8686 range 90547 type 1, low c19cd0f7 range f411e9
    DDL 36133548 CL 24 INST 2
    DD 36133548 sid 3 low 2f7bf200 range 2716f4ac type 2, low 4e0ff765 range 64e0664

    DD 36133548 sid 2 low 52d2fe2d range ad4fc type 0, low d2fe2d00 range 73142d6
    GLS FC 41 code length 24, index bits 6 reduce bits 1 symbol 3180d3
    DD 36133548 sid 1 low 19a42000 range 28a800 type 2, low c3bf1c32 range 6c6aa9d
    DD get code 2 inst
    DD ret
    DD 36133548 sid 1 low c8d0a226 range c60e4 type 0, low d0a22600 range 82ff56c
    GSS code length 9 shift 6 fc 37 symbol 6e
    DD get code 12 inst
    DD ret
    Main 36133549 low c3f8300 range 8e7f000 type 2, low 13102890 range 13d3560

    Start Print
    Main 36133550 low 13afe9d7 range e260f type 1, low b892ca0e range 212d1a2
    DDL 36133550 CL 23 INST 12
    DD 36133550 sid 3 low ba496d01 range 4a16 type 1, low 9e03bcf4 range 7fa7e04
    DDL 36133550 CL 24 INST 6
    DD 36133550 sid 2 low a23e77ed range 1dc5dc type 0, low 3e77ed00 range 1365c1b1
    GLS FC 22 code length 24, index bits 8 reduce bits 1 symbol 523871

    Start Print
    DD 36133550 sid 1 low b72d9400 range 957c00 type 0, low 2d940000 range 622b3934
    GLS FC 32 code length 19, index bits 2 reduce bits 0 symbol 33013
    DD get code 6 inst
    DD ret
    DD 36133550 sid 2 low 6bf95a00 range c9440 type 0, low 6bf95a00 range 852ff
    GSS code length 10 shift 4 fc 31 symbol 78f
    DD 36133550 sid 1 low f9c4bb00 range 4e640 type 0, low c4bb0000 range 3442aaa
    GLS FC 54 code length 23, index bits 4 reduce bits 1 symbol 71c63
    DD get code 12 inst
    DD ret
    Main 36133551 low b1680000 range 13b80000 type 0, low b1680000 range bf750de

    Start Print
    GLS FC 32 code length 24, index bits 7 reduce bits 1 symbol 52e9cd

    Start Print
    Main 36133552 low 63887800 range 6e9c00 type 1, low cbf58f10 range 10df63c4
    DDL 36133552 CL 23 INST 9
    DD 36133552 sid 3 low d9dabd52 range 5be96 type 0, low dabd5200 range 3db649c
    GLS FC 2b code length 24, index bits 4 reduce bits 1 symbol 29e195
    DD 36133552 sid 2 low 65920000 range 12f48000 type 1, low 72629020 range 222c2b0

    DDL 36133552 CL 24 INST 2
    DD 36133552 sid 2 low 7262ca9d range bbf818 type 1, low 72e01a8d range 177efd
    DDL 36133552 CL 24 INST 4
    DD 36133552 sid 2 low eac6fdc2 range 125959b type 0, low eac6fdc2 range c10aa0
    GLS FC 20 code length 24, index bits 9 reduce bits 1 symbol 344b86
    DD 36133552 sid 1 low a7530ce0 range 246d0 type 2, low 54dd8cad range 56a25d
    DD get code 4 inst
    DD ret
    DD 36133552 sid 1 low 551ce705 range c88c type 0, low 1ce70500 range 830608
    GSS code length 10 shift 4 fc 31 symbol ed7
    DD get code 2 inst
    DD ret
    DD 36133552 sid 1 low f32be680 range 1bbdc0 type 0, low 2be68000 range 1240342e
    GSS code length 11 shift 8 fc 3a symbol 1c
    DD get code 9 inst
    DD ret
    Main 36133553 low ae481400 range 37cb400 type 0, low ae481400 range 21c1f38

    Start Print
    GSS code length 11 shift 5 fc 32 symbol 78c
    Main 36133554 low 2fd6db00 range 1386980 type 2, low 30c65ed8 range 2c421a

    Start Print

    Start Print
    Main 36133555 low eaa72bfc range 1cf170 type 0, low a72bfc00 range 117627fa
    GLS FC 49 code length 24, index bits 5 reduce bits 1 symbol 2e7f4d
    Main 36133556 low f531f400 range 914c00 type 2, low a0b653c3 range 156fef2e

    Start Print

    Start Print
    Main 36133557 low b586bd67 range 1c083 type 0, low 86bd6700 range 10dd5d8
    GLS FC 65 code length 21, index bits 3 reduce bits 0 symbol 12c918

    Start Print
    Main 36133558 low 8a39c000 range d2d4000 type 1, low 923205ab range 204de73
    DDL 36133558 CL 24 INST 2
    DD 36133558 sid 3 low 68410b00 range 53790dba type 0, low 68410b00 range 3749875
    e
    GLS FC 67 code length 24, index bits 5 reduce bits 1 symbol 5c62b1
    DD 36133558 sid 2 low f7814000 range 1c26000 type 0, low f7814000 range 12c3fec
    GSS code length 5 shift 3 fc 20 symbol 1d
    DD 36133558 sid 1 low 850f6000 range 19b5d400 type 2, low 99902c8c range 3e7c570

    DD get code 2 inst
    DD ret
    Main 36133559 low 9c7ba10d range 756e7 type 0, low 7ba10d00 range 467571f
    GSS code length 7 shift 6 fc 2c symbol a6

    Start Print
    Main 36133560 low 24d4c200 range 16532800 type 1, low 324bfe68 range 38b2ad8
    DDL 36133560 CL 24 INST 5
    DD 36133560 sid 2 low 3413ef1b range 1899e4 type 0, low 13ef1b00 range 104c5a40
    GSS code length 18 shift 6 fc 64 symbol 11a1
    DD 36133560 sid 1 low c65548c0 range 90e6e
    Start Print
    type 2, low 5c709840 range 17417ed
    DD get code 5 inst
    DD ret
    Main 36133561 low 5d9d0f2a range 125bd type 0, low 9d0f2a00 range afc7b8

    Start Print
    GSS code length 19 shift 8 fc 7a symbol 2bd8a
    Main 36133562 low af413788 range 152a type 2, low 47908b7a range 334e8b2

    Start Print
    Main 36133563 low 4abe5054 range 1659 type 0, low 50540000 range d022241
    GSS code length 20 shift 8 fc 7a symbol 22222

    Start Print

    Start Print
    Main 36133564 low 2d0e4750 range d505 type 0, low e475000 range 7d4e20
    GLS FC 20 code length 22, index bits 7 reduce bits 0 symbol 194610

    Start Print
    Main 36133565 low 2a942428 range 48dac type 0, low 94242800 range 2b4a527
    GLS FC 2e code length 24, index bits 7 reduce bits 1 symbol 2c0e45
    Main 36133566 low 688cb220 range 1b0aa0 type 2, low a12b8a73 range 43fd74d

    Start Print

    Start Print
    Main 36133567 low a55bf493 range 3b38 type 0, low 5bf49300 range 2307d2
    GLS FC 20 code length 24, index bits 9 reduce bits 1 symbol 3919ca
    Main 36133568 low 3ba42b60 range 17fa0 type 0, low a42b6000 range e51c00
    GSS code length 11 shift 7 fc 75 symbol 3b

    Start Print
    Main 36133569 low f4099c00 range 2c5f00 type 1, low 245a853c range 6afa14f
    DDL 36133569 CL 21 INST 0
    DD 36133569 sid 2 low 46d423f6 range 15806406 type 0, low 46d423f6 range e28d7a0

    GSS code length 9 shift 5 fc 2e symbol 2c
    DD 36133569 sid 1 low 4add1949 range b7e00 type 2, low e619429e range 1f03da4
    DD get code 0 inst

    Start Print
    DD ret
    Main 36133570 low e79b7e38 range 20126 type 2, low 9d028c08 range 533694

    Start Print
    Main 36133571 low 41e57203 range 7a4246 type 2, low 4240bb4b range 153103

    Start Print
    Main 36133572 low 5201ca70 range 15e7ac type 2, low 11ee6800 range 408fe00

    Start Print
    Main 36133573 low 12716b46 range abf0c9 type 2, low 12ee775e range 217eab

    Start Print
    Main 36133574 low 13085865 range 2ac7 type 2, low 771b4190 range 8c65be0

    Start Print
    Main 36133575 low 783812da range 1764531 type 1, low 7908871e range 38d9e4
    DDL 36133575 CL 24 INST 4
    DD 36133575 sid 2 low 225ce924 range 2c951bf type 3, low 250442e4 range 21f7b0

    Start Print
    DD 36133575 sid 1 low 1a1bc9f8 range 406698 type 3, low 5866c310 range 3c9ccb1
    DD get code 4 inst
    DD ret
    Main 36133576 low 46753810 range 2bfa214 type 0, low 46753810 range 182ff80

    Start Print
    GSS code length 16 shift 6 fc 38 symbol d8a7
    Main 36133577 low f4e558f8 range 468a8 type 2, low e8812458 range ece0dc

    Start Print

    Start Print
    Main 36133578 low e9383bc8 range 13b70 type 2, low 391ae516 range 453dfe
    Main 36133579 low 23aba238 range b8be284 type 2, low 2bbd310c range 2a4a5f6

    Start Print

    Start Print
    Main 36133580 low 2dc82b12 range 3a986 type 0, low c82b1200 range 1f63e7e
    GLS FC 42 code length 24, index bits 6 reduce bits 1 symbol 3fa973
    Main 36133581 low f9cbe210 range 1c5830 type 2, low df8eb59f range 6ab5684

    Start Print
    Main 36133582 low e4b5ee4c range 999ab
    Start Print
    type 2, low bc845f03 range 25820cd
    Main 36133583 low bd348477 range 4914d0 type 1, low 5b287ae2 range aeb91cb
    DDL 36133583 CL 24 INST 2
    DD 36133583 sid 3 low 61b8987f range 1c395e6 type 3, low 635caccf range 1f8186
    DD 36133583 sid 2 low 6383741c range 1be130 type 0, low 83741c00 range 11a003d7
    GSS code length 10 shift 4 fc 31 symbol 17e
    DD 36133583 sid 1 low 84425af5 range a71c0 type 0, low 425af500 range 6a57a10
    GSS code length 11 shift 7 fc 54 symbol 81
    DD get code 2 inst
    DD ret

    Start Print

    Start Print
    Main 36133584 low d6277200 range 630100 type 0, low 27720000 range 33c085b0
    GLS FC 31 code length 19, index bits 3 reduce bits 0 symbol 32e5
    Main 36133585 low 1b045c80 range 3c740 type 0, low 45c8000 range 1fecb5d
    GLS FC 62 code length 19, index bits 1 reduce bits 0 symbol ea46
    Output mismatch 32 vs. 30 at 877657951
    18066792 symbols sent, char 1 of 4
    Prior chars 2d 31 30 54 31

    HTML2.txt = 877,657,950 bytes

  22. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (31st May 2015)

  23. #677
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I tested GLZA 0.2b in WCC benchmark. No problem.

  24. The Following User Says Thank You to Nania Francesco For This Useful Post:

    Kennon Conrad (31st May 2015)

  25. #678
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Nania Francesco View Post
    I tested GLZA 0.2b in WCC benchmark. No problem.
    I am glad the test completed! Compression times are terrible and most compression ratios not too great but I was most curious about how it would do on the text test. I was happy to see GLZA ranked #1 for decompression efficiency on the text benchmark.

  26. #679
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    glzadecodetest html.txt.glze html2.txt html.txt:
    It looks like a problem in the encoding or decoding of the "starts with character 0x31" dictionary index on the 36133585'th symbol. Here's a new test package that will hopefully show enough details of the index calculations to figure this out. Please run GLZAdecodeTest like above and also run GLZAencodeTest on html.txt.glzc.
    Attached Files Attached Files

  27. #680
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Please run GLZAdecodeTest like above and also run GLZAencodeTest on html.txt.glzc.
    glzadecodetest html.txt.glze html2.txt html.txt
    Main 36133582 m1 mcl 111 21 low e4b5ee4c range 999ab type 2, low bc845f03 range
    25820cd
    Main 36133583 m1 mcl 111 21 low bd348477 range 4914d0 type 1, low 5b287ae2 range
    aeb91cb
    DDL 36133583 CL 24 INST 2
    DD 36133583 sid 3 low 61b8987f range 1c395e6 type 3, low 635caccf range 1f8186
    DD 36133583 sid 2 low 6383741c range 1be130 type 0, low 83741c00 range 11a003d7
    DecFirst NORM last 2d low 83741c00 range 11a003d7
    DecFirst END first 31 low 8403b075 range 15596bc
    DecBin NORM bins 2093 low 8403b075 range 15596bc bin 417 code length 10 shift 4
    EndDecBin low 84425af5 range a71c0
    GSS code length 10 shift 4 fc 31 symbol 17e
    DD 36133583 sid 1 low 84425af5 range a71c0 type 0, low 425af500 range 6a57a10
    DecFirst NORM last 30 low 425af500 range 6a57a10
    DecFirst END first 54 low 47d62772 range 6077a
    DecBin NORM bins 3991 low d6277200 range 6077a00 bin 48 code length 11 shift 7
    EndDecBin low d6277200 range 630100
    GSS code length 11 shift 7 fc 54 symbol 81
    DD get code 2 inst
    DD ret
    Main 36133584 m1 mcl 111 21 low d6277200 range 630100 type 0, low 27720000 range
    33c085b0
    DecFirst NORM last 54 low 27720000 range 33c085b0
    DecFirst END first 31 low 39425310 range f72684
    DecBin NORM bins 2093 low 39425310 range f72684 bin 1835 code length 19 shift 4
    EndDecBin low 3a1afcce range 1e3a
    GLS FC 31 code length 19, index bits 3 reduce bits 0
    DecIndex NORM bits 3 first bin 1834 low 1afcce00 range 1e3a00
    DecIndex BinCode 2 initial index 10, reduce index 326
    DecIndex END BinCode 2 final index 10 extra_bins 0 low 1b045c80 range 3c740
    GLS end - symbol 32e5
    Main 36133585 m1 mcl 111 21 low 1b045c80 range 3c740 type 0, low 45c8000 range 1
    fecb5d
    DecFirst NORM last 2f low 45c8000 range 1fecb5d
    DecFirst END first 62 low 512c4a9 range 1624b5
    DecBin NORM bins 3594 low 12c4a900 range 1624b500 bin 2170 code length 19 shift
    6
    EndDecBin low 20234dd6 range 193c7
    GLS FC 62 code length 19, index bits 1 reduce bits 0
    DecIndex NORM bits 1 first bin 2131 low 234dd600 range 193c700
    DecIndex BinCode 0 initial index 78, reduce index 572
    DecIndex END BinCode 0 final index 78 extra_bins 0 low 234dd600 range c9e380
    GLS end - symbol ea46
    Output mismatch 32 vs. 30 at 877657951
    36133584 symbols sent, char 1 of 4
    Prior chars 2d 31 30 54 31

    html2.txt = 877,657,950 bytes

    ------------------------------------------------------------------

    html.txt.glzc = 254,217,615 bytes

    glzaencodetest html.txt.glzc html.txt.glze
    cap encoded 0, UTF8 compliant 0
    Read 67467386 symbols including 7035748 definition symbols
    Main 36133581: inst 6 of 21, low f9cbe210 range 1c5830
    Main 36133582: inst 564 of 510, low e4b5ee4c range 999ab
    Main 36133583: inst 1 of 2, low bd348477 range 4914d0
    EDS symbol 1b0588 start 31 index 2, low 83741c00 range 11a003d7
    EncFirst NORM last 2d low 83741c00 range 11a003d7
    EncFirst END first 31 low 8403b075 range 15596bc
    EDS code length 10, shift 4 bins 1, index 2
    EncShort NORM bin 384 of 2093, 1 bins, low 8403b075 range 15596bc
    EncShort END low 84425af5 range a71c0
    EDS symbol 54 start 54 index 0, low 425af500 range 6a57a10
    EncFirst NORM last 30 low 425af500 range 6a57a10
    EncFirst END first 54 low 47d62772 range 6077a
    EDS code length 11, shift 7 bins 1, index 0
    EncShort NORM bin 0 of 3991, 1 bins, low d6277200 range 6077a00
    EncShort END low d6277200 range 630100
    Main 36133584: inst 80 of 85, low d6277200 range 630100
    EDS symbol 6bd05d start 31 index 18, low 27720000 range 33c085b0
    EncFirst NORM last 54 low 27720000 range 33c085b0
    EncFirst END first 31 low 39425310 range f72684
    EDS code length 19, shift 4 reduce bits 0
    EDS first bin 1833, extra index 326
    EDS < ME send long code length 15 bin 1835 bin code 2 bins 1
    EncLong NORM bin 1835 of 2093, 1 bins CL 15, low 39425310 range f72684
    EncLong BIN low 3a1afcce range 1e3a
    EncLong NORM2 low 1afcce00 range 1e3a00
    EncLong END low 1b045c80 range 3c740
    Main 36133585: inst 28 of 35, low 1b045c80 range 3c740
    EDS symbol 441143 start 31 index 640, low 45c8000 range 1fecb5d
    EncFirst NORM last 3a low 45c8000 range 1fecb5d
    EncFirst END first 31 low 4d19af5 range 55e5e5
    EDS code length 20, shift 4 reduce bits 0
    EDS first bin 1874, extra index 814
    EDS < ME send long code length 16 bin 1914 bin code 0 bins 1
    EncLong NORM bin 1914 of 2093, 1 bins CL 16, low 4d19af5 range 55e5e5
    EncLong BIN low 520236f range a81
    EncLong NORM2 low 236f0000 range a810000
    EncLong END low 236f0000 range a81000
    Encoded 39601494 level 1 symbols
    Compressed file size: 100893688 bytes, dictionary size: 6921160 symbols
    elapsed time = 24.594000 seconds.

    html.txt.glze = 100,893,688 bytes

  28. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (1st June 2015)

  29. #681
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    The decoder's maximum code length variable is corrupt and the dictionary starting bin number is off by one. The maximum code length is set to a value between 1 and 32 (bits) when the header is read and not modified once decoding starts, yet it has a value of 111 when decompression fails. I disassembled the code and found this variable located right after the structure that holds the dictionary strings. I think your file is causing this structure to overflow because of the large number of symbols in the dictionary and a relatively large average string length. This version will either run to completion if that was the problem or provide additional information. Even if it works, I would like to see the last line printed to see exactly how big the dictionary was.
    Attached Files Attached Files

  30. #682
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Even if it works, I would like to see the last line printed to see exactly how big the dictionary was.
    glzadecode4 html.txt.glze html2.txt html.txt
    Main 36133582 m1 mcl 24 21 low e4b5ee4c range 999ab type 2, low bc845f03 range 2
    5820cd
    Main 36133583 m1 mcl 24 21 low bd348477 range 4914d0 type 1, low 5b287ae2 range
    aeb91cb
    DDL 36133583 CL 24 INST 2
    DD 36133583 sid 3 low 61b8987f range 1c395e6 type 3, low 635caccf range 1f8186
    DD 36133583 sid 2 low 6383741c range 1be130 type 0, low 83741c00 range 11a003d7
    DecFirst NORM last 2d low 83741c00 range 11a003d7
    DecFirst END first 31 low 8403b075 range 15596bc
    DecBin NORM bins 2093 low 8403b075 range 15596bc bin 417 code length 10 shift 4
    EndDecBin low 84425af5 range a71c0
    GSS code length 10 shift 4 fc 31 symbol 17e
    DD 36133583 sid 1 low 84425af5 range a71c0 type 0, low 425af500 range 6a57a10
    DecFirst NORM last 30 low 425af500 range 6a57a10
    DecFirst END first 54 low 47d62772 range 6077a
    DecBin NORM bins 3991 low d6277200 range 6077a00 bin 48 code length 11 shift 7
    EndDecBin low d6277200 range 630100
    GSS code length 11 shift 7 fc 54 symbol 81
    DD get code 2 inst
    DD ret
    Main 36133584 m1 mcl 24 21 low d6277200 range 630100 type 0, low 27720000 range
    33c085b0
    DecFirst NORM last 54 low 27720000 range 33c085b0
    DecFirst END first 31 low 39425310 range f72684
    DecBin NORM bins 2093 low 39425310 range f72684 bin 1835 code length 19 shift 4
    EndDecBin low 3a1afcce range 1e3a
    GLS FC 31 code length 19, index bits 3 reduce bits 0
    DecIndex NORM bits 3 first bin 1833 low 1afcce00 range 1e3a00
    DecIndex BinCode 2 initial index 18, reduce index 326
    DecIndex END BinCode 2 final index 18 extra_bins 0 low 1b045c80 range 3c740
    GLS end - symbol 7079
    Main 36133585 m1 mcl 24 21 low 1b045c80 range 3c740 type 0, low 45c8000 range 1f
    ecb5d
    DecFirst NORM last 3a low 45c8000 range 1fecb5d
    DecFirst END first 31 low 4d19af5 range 55e5e5
    DecBin NORM bins 2093 low 4d19af5 range 55e5e5 bin 1914 code length 20 shift 4
    EndDecBin low 520236f range a81
    GLS FC 31 code length 20, index bits 4 reduce bits 0
    DecIndex NORM bits 4 first bin 1874 low 236f0000 range a810000
    DecIndex BinCode 0 initial index 640, reduce index 814
    DecIndex END BinCode 0 final index 640 extra_bins 0 low 236f0000 range a81000
    GLS end - symbol df47a
    Main 36133586 m1 mcl 24 21 low 236f0000 range a81000 type 1, low 23c8a210 range
    1a249a
    DDL 36133586 CL 24 INST 2
    DD 36133586 sid 4 low de6c2190 range 19f473a type 0, low de6c2190 range 109f6f2
    DecFirst NORM last 33 low de6c2190 range 109f6f2
    DecFirst END first 2b low dea5a9fc range 117e0
    DecBin NORM bins 3189 low a5a9fc00 range 117e000 bin 3165 code length 24 shift 8

    EndDecBin low a6bfb93b range 1677
    GLS FC 2b code length 24, index bits 4 reduce bits 1
    DecIndex NORM bits 3 first bin 3002 low b93b0000 range 16770000
    DecIndex BinCode 2 initial index 1306, reduce index 90
    DecIndex END BinCode 2 final index 698 extra_bins 0 low bed8c000 range 59dc000
    GLS end - symbol 40ec24
    DD 36133586 sid 3 low bed8c000 range 59dc000 type 0, low bed8c000 range 39e8caa
    DecFirst NORM last 20 low bed8c000 range 39e8caa
    DecFirst END first 6d low c170c963 range 1339f6
    DecBin NORM bins 2708 low 70c96300 range 1339f600 bin 2326 code length 23 shift
    6
    EndDecBin low 814d0272 range 1d14b
    GLS FC 6d code length 23, index bits 5 reduce bits 1
    DecIndex NORM bits 4 first bin 2183 low 4d027200 range 1d14b00
    DecIndex BinCode 7 initial index 2295, reduce index 2324
    DecIndex END BinCode 7 final index 2295 extra_bins 0 low 4dce02d0 range 1d14b0
    GLS end - symbol d7000
    DD 36133586 sid 2 low 4dce02d0 range 1d14b0 type 0, low ce02d000 range 12dac649
    DecFirst NORM last 33 low ce02d000 range 12dac649
    DecFirst END first 20 low ce434f38 range 17e0538
    DecBin NORM bins 3153 low ce434f38 range 17e0538 bin 505 code length 5 shift 3
    EndDecBin low ce434f38 range 7c1000
    GSS code length 5 shift 3 fc 20 symbol 1d
    DD 36133586 sid 1 low ce434f38 range 7c1000 type 0, low 434f3800 range 50e90ae8
    DecFirst NORM last 20 low 434f3800 range 50e90ae8
    DecFirst END first 6f low 802374e1 range 12c030a
    DecBin NORM bins 3392 low 802374e1 range 12c030a bin 1026 code length 14 shift 7

    EndDecBin low 807e04e1 range 2d480
    GSS code length 14 shift 7 fc 6f symbol 878
    DD get code 2 inst
    DD ret
    Decompressed 967021395 bytes in 37548 msec

    html2.txt = 967,021,395 bytes (compare OK)

  31. #683
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Thank you for reporting the bug and testing! I messed up the code to print the dictionary size at the end, but it's not a big deal. I will release a patch soon.

  32. #684
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.2c

    This version of GLZA has the bug fix for Sportman's file, the prior bug fixes for Nania's MOC, and also slightly faster decompression for most files. For instance, enwik9 decodes in 15.8 seconds instead of 16.4 seconds.
    Attached Files Attached Files

  33. The Following 3 Users Say Thank You to Kennon Conrad For This Useful Post:

    jedie (22nd June 2015),Sportman (1st June 2015),surfersat (3rd June 2015)

  34. #685
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    This version of GLZA has the bug fix for Sportman's file
    I tested at two files who had the crash before and both work well now, compare also Ok.

    Run time 9184.652 seconds.
    elapsed time = 24.594000 seconds.
    Decompressed in 9469 msec.

    Input:
    967,021,395 bytes

    Output:
    100,893,688 bytes

    -------------------------------------------

    Run time 9112.909 seconds.
    elapsed time = 26.485000 seconds.
    Decompressed in 10140 msec.

    Input:
    1,023,351,091 bytes

    Output:
    105,677,178 bytes

  35. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (2nd June 2015)

  36. #686
    Member
    Join Date
    Jun 2015
    Location
    DE
    Posts
    15
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Here is the main thread for GLZA, isn't it?

    Has somebody try to compile GLZA sources via emscripten to JavaScript? (because of: http://encode.su/threads/2227-Search...r-than-deflate )

  37. #687
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Im not answering your actual question here but because decompression speed is more important than the ratio for you, you can pick an earlier "tree" release which was a little bit faster in decompression.

  38. #688
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by jedie View Post
    Here is the main thread for GLZA, isn't it?
    Yes.

    Quote Originally Posted by jedie View Post
    Has somebody try to compile GLZA sources via emscripten to JavaScript? (because of: http://encode.su/threads/2227-Search...r-than-deflate )
    I made a quick try, but it didn't seem to work. The build environment may not be set up right or the windows data types may be messing things up and I didn't feel like figuring it out at the time.

    Tree v0.19 (Post #589) typically has about 2x faster decoding than GLZA. Also, if you want faster decoding and are willing to give up a little compression ratio, using command line option -m (such as -m50 or -m100) can provide about 2x faster decompression with less memory because it will not use as large of a dictionary as the default -m4.5 setting.

    It looked like you got good results with Tornado. That may be best suited to your requirements, although Lzham may be another good option if you can get it to work with emscripten.
    Last edited by Kennon Conrad; 22nd June 2015 at 19:21.

  39. #689
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.3

    I added some code so there are separate models for symbols that are used depending on how likely the symbol is to be followed by a symbol that starts with a space character, based on the trailing subsymbol(s). Also, I simplified the scoring formula in GLZAcompress so it usually runs a little faster and results are a little different, but not hugely different. Also, I changed encoding and decoding so that only symbols that occur up to 15 times (instead of 20) have separate MTF queues. There were several other other minor changes.

    Results for enwik9:
    165,419,346 bytes in 4,156 seconds/6026 MB RAM, decompresses in 14.9 seconds/330 MB RAM

    Results for enwik8:
    20,541,988 bytes in 266 seconds, decompresses in 1.8 seconds.

    A .zip file of GLZAdecode.c and GLZAmodel.c is 18,982 bytes.
    Attached Files Attached Files

  40. #690
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  41. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Kennon Conrad (14th July 2015)

Page 23 of 29 FirstFirst ... 132122232425 ... LastLast

Similar Threads

  1. Replies: 4
    Last Post: 2nd December 2012, 02:55
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 07:37
  3. M03 alpha
    By michael maniscalco in forum Data Compression
    Replies: 6
    Last Post: 10th October 2009, 00:31
  4. PIM 2.00 (alpha) is here!!!
    By encode in forum Forum Archive
    Replies: 46
    Last Post: 14th June 2007, 19:27
  5. PIM 2.00 (alpha) overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 8th June 2007, 13:41

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •