Page 24 of 29 FirstFirst ... 142223242526 ... LastLast
Results 691 to 720 of 849

Thread: Tree alpha v0.1 download

  1. #691
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Matt Mahoney View Post
    It looks like GLZA is now Pareto Frontier for enwik9 on decompression RAM. Did I miss anything? Decompression RAM could be considerably lower if the code saved symbols representing long strings using their subsymbols instead of the string the symbol represents.

    The modeling of trailing subsymbols to predict symbols that start with a space only scratches the surface of what is possible with subsymbol modeling. The possibilities are boggling my mind a little - I'm not sure exactly how to approach it. One possibility is to use a separate model for each common trailing subsymbol. Another possibility is to explicitly categorize each common trailing subsymbol according to the leading subsymbols that tend to follow it and use a model per trailing subsymbol category.

  2. #692
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I don't track decompression memory for LTCB but you are probably right that glza would be on the Pareto frontier. All of the higher ranked compressors are CM, PPM, or BWT. They all use the same memory to compress and decompress. Only bwmonstr comes close.

  3. #693
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    GLZA v0.3
    GLZAdecode still running after 65 hours (17% CPU, peak memory 365,832K)
    Original: 967,021,395 bytes (that HTML file again)
    Disk out: 646,223,926 bytes

    Console log:
    Read 73716742 of 73716742 symbols, start 0.0000
    Common prefix scan 0 - 48a476, score[22904] = 266251.88
    621: 73636084 syms, dict. size 4777840, 15.5896 bits/sym, o0e 143494824 bytes
    Read 73636084 of 73636084 symbols, start 0.0000
    Common prefix scan 0 - 48e86f, score[22814] = 254973.97
    622: 73560053 syms, dict. size 4795285, 15.6068 bits/sym, o0e 143504518 bytes
    Read 73560053 of 73560053 symbols, start 0.0000
    Common prefix scan 0 - 492c94, score[22810] = 243742.33
    623: 73488119 syms, dict. size 4812798, 15.6231 bits/sym, o0e 143514126 bytes
    Read 73488119 of 73488119 symbols, start 0.0000
    Common prefix scan 0 - 4970fd, score[22850] = 233304.75
    624: 73418110 syms, dict. size 4830337, 15.6392 bits/sym, o0e 143525084 bytes
    Read 73418110 of 73418110 symbols, start 0.0000
    Common prefix scan 0 - 49b580, score[22879] = 223044.06
    625: 73346795 syms, dict. size 4848004, 15.6557 bits/sym, o0e 143536549 bytes
    Read 73346795 of 73346795 symbols, start 0.0000
    Common prefix scan 0 - 49fa83, score[22966] = 212916.59
    626: 73273068 syms, dict. size 4865637, 15.6726 bits/sym, o0e 143547252 bytes
    Read 73273068 of 73273068 symbols, start 0.0000
    Common prefix scan 0 - 4a3f64, score[22974] = 203282.03
    627: 73198248 syms, dict. size 4883454, 15.6898 bits/sym, o0e 143558447 bytes
    Read 73198248 of 73198248 symbols, start 0.0000
    Common prefix scan 0 - 4a84fd, score[23088] = 194019.06
    628: 73116757 syms, dict. size 4901430, 15.7084 bits/sym, o0e 143568675 bytes
    Read 73116757 of 73116757 symbols, start 0.0000
    Common prefix scan 0 - 4acb35, score[23222] = 185204.88
    629: 73033037 syms, dict. size 4919785, 15.7277 bits/sym, o0e 143580303 bytes
    Read 73033037 of 73033037 symbols, start 0.0000
    Common prefix scan 0 - 4b12e8, score[23496] = 176302.67
    630: 72951177 syms, dict. size 4938514, 15.7468 bits/sym, o0e 143593066 bytes
    Read 72951177 of 72951177 symbols, start 0.0000
    Common prefix scan 0 - 4b5c11, score[23814] = 167743.80
    631: 72867836 syms, dict. size 4957593, 15.7662 bits/sym, o0e 143605707 bytes
    Read 72867836 of 72867836 symbols, start 0.0000
    Common prefix scan 0 - 4ba698, score[24131] = 159197.00
    632: 72773108 syms, dict. size 4976974, 15.7881 bits/sym, o0e 143618719 bytes
    Read 72773108 of 72773108 symbols, start 0.0000
    Common prefix scan 0 - 4bf24d, score[24419] = 151300.39
    633: 72691655 syms, dict. size 4996784, 15.8074 bits/sym, o0e 143633200 bytes
    Read 72691655 of 72691655 symbols, start 0.0000
    Common prefix scan 0 - 4c3faf, score[24774] = 143097.30
    634: 72610190 syms, dict. size 5017113, 15.8269 bits/sym, o0e 143649132 bytes
    Read 72610190 of 72610190 symbols, start 0.0000
    Common prefix scan 0 - 4c8f18, score[25206] = 135378.98
    635: 72528555 syms, dict. size 5037851, 15.8465 bits/sym, o0e 143665535 bytes
    Read 72528555 of 72528555 symbols, start 0.0000
    Common prefix scan 0 - 4ce01a, score[25597] = 127707.13
    636: 72446032 syms, dict. size 5058982, 15.8665 bits/sym, o0e 143683454 bytes
    Read 72446032 of 72446032 symbols, start 0.0000
    Common prefix scan 0 - 4d32a5, score[25965] = 120450.65
    637: 72360241 syms, dict. size 5080563, 15.8873 bits/sym, o0e 143701142 bytes
    Read 72360241 of 72360241 symbols, start 0.0000
    Common prefix scan 0 - 4d86f2, score[26359] = 113390.52
    638: 72275545 syms, dict. size 5102666, 15.9081 bits/sym, o0e 143721239 bytes
    Read 72275545 of 72275545 symbols, start 0.0000
    Common prefix scan 0 - 4ddd49, score[26806] = 106421.99
    639: 72189099 syms, dict. size 5125411, 15.9295 bits/sym, o0e 143741878 bytes
    Read 72189099 of 72189099 symbols, start 0.0000
    Common prefix scan 0 - 4e3622, score[27343] = 99646.63
    640: 72091256 syms, dict. size 5148731, 15.9534 bits/sym, o0e 143762460 bytes
    Read 72091256 of 72091256 symbols, start 0.0000
    Common prefix scan 0 - 4e913a, score[27869] = 93234.65
    641: 71993620 syms, dict. size 5172757, 15.9776 bits/sym, o0e 143785286 bytes
    Read 71993620 of 71993620 symbols, start 0.0000
    Common prefix scan 0 - 4eef14, score[28471] = 86889.95
    642: 71898750 syms, dict. size 5197637, 16.0013 bits/sym, o0e 143809061 bytes
    Read 71898750 of 71898750 symbols, start 0.0000
    Common prefix scan 0 - 4f5044, score[29188] = 80743.34
    643: 71798455 syms, dict. size 5223494, 16.0265 bits/sym, o0e 143834365 bytes
    Read 71798455 of 71798455 symbols, start 0.0000
    Common prefix scan 0 - 4fb545, score[29949] = 74506.49
    644: 71707899 syms, dict. size 5250154, 16.0499 bits/sym, o0e 143863040 bytes
    Read 71707899 of 71707899 symbols, start 0.0000
    Common prefix scan 0 - 501d69, score[30000] = 68791.25
    645: 71614412 syms, dict. size 5276912, 16.0743 bits/sym, o0e 143893535 bytes
    Read 71614412 of 71614412 symbols, start 0.0000
    Common prefix scan 0 - 5085ef, score[30000] = 63728.54
    646: 71504269 syms, dict. size 5303745, 16.1023 bits/sym, o0e 143922873 bytes
    Read 71504269 of 71504269 symbols, start 0.0000
    Common prefix scan 0 - 50eec0, score[30000] = 58794.20
    647: 71397362 syms, dict. size 5330724, 16.1298 bits/sym, o0e 143952706 bytes
    Read 71397362 of 71397362 symbols, start 0.0000
    Common prefix scan 0 - 515823, score[30000] = 54299.16
    648: 71298268 syms, dict. size 5358315, 16.1558 bits/sym, o0e 143984645 bytes
    Read 71298268 of 71298268 symbols, start 0.0000
    Common prefix scan 0 - 51c3ea, score[29798] = 49796.88
    649: 71203100 syms, dict. size 5385987, 16.1811 bits/sym, o0e 144017964 bytes
    Read 71203100 of 71203100 symbols, start 0.0000
    Common prefix scan 0 - 523002, score[30000] = 45574.20
    650: 71115297 syms, dict. size 5413922, 16.2050 bits/sym, o0e 144053330 bytes
    Read 71115297 of 71115297 symbols, start 0.0000
    Common prefix scan 0 - 529d21, score[30000] = 41839.20
    651: 71029365 syms, dict. size 5441911, 16.2287 bits/sym, o0e 144089388 bytes
    Read 71029365 of 71029365 symbols, start 0.0000
    Common prefix scan 0 - 530a76, score[30000] = 38277.63
    652: 70942356 syms, dict. size 5469908, 16.2528 bits/sym, o0e 144126684 bytes
    Read 70942356 of 70942356 symbols, start 0.0000
    Common prefix scan 0 - 5377d3, score[30000] = 34955.42
    653: 70859055 syms, dict. size 5497804, 16.2763 bits/sym, o0e 144165275 bytes
    Read 70859055 of 70859055 symbols, start 0.0000
    Common prefix scan 0 - 53e4cb, score[30000] = 31851.23
    654: 70772484 syms, dict. size 5525556, 16.3004 bits/sym, o0e 144202777 bytes
    Read 70772484 of 70772484 symbols, start 0.0000
    Common prefix scan 0 - 545133, score[30000] = 29116.90
    655: 70691304 syms, dict. size 5553269, 16.3236 bits/sym, o0e 144241835 bytes
    Read 70691304 of 70691304 symbols, start 0.0000
    Common prefix scan 0 - 54bd74, score[30000] = 26676.49
    656: 70609344 syms, dict. size 5580922, 16.3472 bits/sym, o0e 144282773 bytes
    Read 70609344 of 70609344 symbols, start 0.0000
    Common prefix scan 0 - 552979, score[29961] = 24269.80
    657: 70529827 syms, dict. size 5608597, 16.3702 bits/sym, o0e 144323279 bytes
    Read 70529827 of 70529827 symbols, start 0.0000
    Common prefix scan 0 - 559594, score[30000] = 22014.82
    658: 70444991 syms, dict. size 5636287, 16.3944 bits/sym, o0e 144363235 bytes
    Read 70444991 of 70444991 symbols, start 0.0000
    Common prefix scan 0 - 5601be, score[30000] = 20075.97
    659: 70363065 syms, dict. size 5664059, 16.4183 bits/sym, o0e 144404816 bytes
    Read 70363065 of 70363065 symbols, start 0.0000
    Common prefix scan 0 - 566e3a, score[30000] = 18192.02
    660: 70283501 syms, dict. size 5691958, 16.4417 bits/sym, o0e 144447611 bytes
    Read 70283501 of 70283501 symbols, start 0.0000
    Common prefix scan 0 - 56db35, score[30000] = 16521.23
    661: 70198470 syms, dict. size 5719853, 16.4665 bits/sym, o0e 144490101 bytes
    Read 70198470 of 70198470 symbols, start 0.0000
    Common prefix scan 0 - 57482c, score[30000] = 14954.47
    662: 70123915 syms, dict. size 5747783, 16.4892 bits/sym, o0e 144535578 bytes
    Read 70123915 of 70123915 symbols, start 0.0000
    Common prefix scan 0 - 57b546, score[30000] = 13489.91
    663: 70035068 syms, dict. size 5775831, 16.5151 bits/sym, o0e 144579724 bytes
    Read 70035068 of 70035068 symbols, start 0.0000
    Common prefix scan 0 - 5822d6, score[30000] = 12111.00
    664: 69947689 syms, dict. size 5804027, 16.5409 bits/sym, o0e 144624302 bytes
    Read 69947689 of 69947689 symbols, start 0.0000
    Common prefix scan 0 - 5890fa, score[30000] = 10905.96
    665: 69867757 syms, dict. size 5832324, 16.5651 bits/sym, o0e 144670987 bytes
    Read 69867757 of 69867757 symbols, start 0.0000
    Common prefix scan 0 - 58ff83, score[30000] = 9808.14
    666: 69779504 syms, dict. size 5860652, 16.5914 bits/sym, o0e 144717237 bytes
    Read 69779504 of 69779504 symbols, start 0.0000
    Common prefix scan 0 - 596e2b, score[30000] = 8842.77
    667: 69693871 syms, dict. size 5888989, 16.6173 bits/sym, o0e 144765257 bytes
    Read 69693871 of 69693871 symbols, start 0.0000
    Common prefix scan 0 - 59dcdc, score[29147] = 7916.67
    668: 69614224 syms, dict. size 5916694, 16.6418 bits/sym, o0e 144813246 bytes
    Read 69614224 of 69614224 symbols, start 0.0000
    Common prefix scan 0 - 5a4915, score[30000] = 7098.51
    669: 69534305 syms, dict. size 5945415, 16.6667 bits/sym, o0e 144863053 bytes
    Read 69534305 of 69534305 symbols, start 0.0000
    Common prefix scan 0 - 5ab946, score[28442] = 6320.43
    670: 69458852 syms, dict. size 5972607, 16.6903 bits/sym, o0e 144911265 bytes
    Read 69458852 of 69458852 symbols, start 0.0000
    Common prefix scan 0 - 5b237e, score[30000] = 5582.82
    671: 69375419 syms, dict. size 6001267, 16.7162 bits/sym, o0e 144961800 bytes
    Read 69375419 of 69375419 symbols, start 0.0000
    Common prefix scan 0 - 5b9372, score[30000] = 4955.12
    672: 69293483 syms, dict. size 6029997, 16.7421 bits/sym, o0e 145014748 bytes
    Read 69293483 of 69293483 symbols, start 0.0000
    Common prefix scan 0 - 5c03ac, score[30000] = 4403.25
    673: 69210414 syms, dict. size 6058768, 16.7684 bits/sym, o0e 145068108 bytes
    Read 69210414 of 69210414 symbols, start 0.0000
    Common prefix scan 0 - 5c740f, score[29854] = 3885.44
    674: 69131972 syms, dict. size 6087457, 16.7937 bits/sym, o0e 145122313 bytes
    Read 69131972 of 69131972 symbols, start 0.0000
    Common prefix scan 0 - 5ce420, score[30000] = 3429.85
    675: 69052675 syms, dict. size 6116254, 16.8194 bits/sym, o0e 145178022 bytes
    Read 69052675 of 69052675 symbols, start 0.0000
    Common prefix scan 0 - 5d549d, score[29342] = 3006.56
    676: 68975664 syms, dict. size 6144532, 16.8446 bits/sym, o0e 145233140 bytes
    Read 68975664 of 68975664 symbols, start 0.0000
    Common prefix scan 0 - 5dc313, score[30000] = 2615.81
    677: 68896206 syms, dict. size 6173497, 16.8706 bits/sym, o0e 145289891 bytes
    Read 68896206 of 68896206 symbols, start 0.0000
    Common prefix scan 0 - 5e3438, score[30000] = 2280.39
    678: 68817236 syms, dict. size 6202505, 16.8967 bits/sym, o0e 145347753 bytes
    Read 68817236 of 68817236 symbols, start 0.0000
    Common prefix scan 0 - 5ea588, score[30000] = 1979.89
    679: 68737777 syms, dict. size 6231445, 16.9230 bits/sym, o0e 145406567 bytes
    Read 68737777 of 68737777 symbols, start 0.0000
    Common prefix scan 0 - 5f1694, score[30000] = 1736.59
    680: 68667117 syms, dict. size 6260501, 16.9475 bits/sym, o0e 145467127 bytes
    Read 68667117 of 68667117 symbols, start 0.0000
    Common prefix scan 0 - 5f8814, score[25782] = 1512.48
    681: 68604742 syms, dict. size 6285387, 16.9690 bits/sym, o0e 145519389 bytes
    Read 68604742 of 68604742 symbols, start 0.0000
    Common prefix scan 0 - 5fe94a, score[29701] = 1297.98
    682: 68534404 syms, dict. size 6314058, 16.9935 bits/sym, o0e 145579942 bytes
    Read 68534404 of 68534404 symbols, start 0.0000
    Common prefix scan 0 - 605949, score[28955] = 1106.06
    683: 68463078 syms, dict. size 6342006, 17.0183 bits/sym, o0e 145640367 bytes
    Read 68463078 of 68463078 symbols, start 0.0000
    Common prefix scan 0 - 60c675, score[30000] = 920.54
    684: 68386993 syms, dict. size 6370870, 17.0446 bits/sym, o0e 145703858 bytes
    Read 68386993 of 68386993 symbols, start 0.0000
    Common prefix scan 0 - 613735, score[30000] = 770.69
    685: 68307873 syms, dict. size 6399712, 17.0717 bits/sym, o0e 145766748 bytes
    Read 68307873 of 68307873 symbols, start 0.0000
    Common prefix scan 0 - 61a7df, score[29757] = 640.67
    686: 68236619 syms, dict. size 6428359, 17.0971 bits/sym, o0e 145831155 bytes
    Read 68236619 of 68236619 symbols, start 0.0000
    Common prefix scan 0 - 6217c6, score[30000] = 545.29
    687: 68175439 syms, dict. size 6457354, 17.1204 bits/sym, o0e 145898492 bytes
    Read 68175439 of 68175439 symbols, start 0.0000
    Common prefix scan 0 - 628909, score[24369] = 460.83
    688: 68116213 syms, dict. size 6480894, 17.1416 bits/sym, o0e 145952458 bytes
    Read 68116213 of 68116213 symbols, start 0.0000
    Common prefix scan 0 - 62e4fd, score[27887] = 369.75
    689: 68051663 syms, dict. size 6507869, 17.1652 bits/sym, o0e 146014928 bytes
    Read 68051663 of 68051663 symbols, start 0.0000
    Common prefix scan 0 - 634e5c, score[30000] = 316.34
    690: 67998606 syms, dict. size 6537112, 17.1868 bits/sym, o0e 146084566 bytes
    Read 67998606 of 67998606 symbols, start 0.0000
    Common prefix scan 0 - 63c097, score[20538] = 268.71
    691: 67942954 syms, dict. size 6556984, 17.2063 bits/sym, o0e 146131200 bytes
    Read 67942954 of 67942954 symbols, start 0.0000
    Common prefix scan 0 - 640e37, score[23654] = 216.63
    692: 67884900 syms, dict. size 6579866, 17.2275 bits/sym, o0e 146186194 bytes
    Read 67884900 of 67884900 symbols, start 0.0000
    Common prefix scan 0 - 646799, score[27381] = 167.61
    693: 67824323 syms, dict. size 6606268, 17.2505 bits/sym, o0e 146250793 bytes
    Read 67824323 of 67824323 symbols, start 0.0000
    Common prefix scan 0 - 64cebb, score[29744] = 133.34
    694: 67764906 syms, dict. size 6634943, 17.2741 bits/sym, o0e 146321836 bytes
    Read 67764906 of 67764906 symbols, start 0.0000
    Common prefix scan 0 - 653ebe, score[23169] = 105.29
    695: 67698126 syms, dict. size 6657153, 17.2976 bits/sym, o0e 146376547 bytes
    Read 67698126 of 67698126 symbols, start 0.0000
    Common prefix scan 0 - 659580, score[28084] = 79.48
    696: 67642580 syms, dict. size 6684279, 17.3199 bits/sym, o0e 146445456 bytes
    Read 67642580 of 67642580 symbols, start 0.0000
    Common prefix scan 0 - 65ff76, score[26731] = 59.53
    697: 67582737 syms, dict. size 6710073, 17.3430 bits/sym, o0e 146511262 bytes
    Read 67582737 of 67582737 symbols, start 0.0000
    Common prefix scan 0 - 666438, score[26218] = 42.16
    698: 67518902 syms, dict. size 6735275, 17.3671 bits/sym, o0e 146576283 bytes
    Read 67518902 of 67518902 symbols, start 0.0000
    Common prefix scan 0 - 66c6aa, score[27655] = 28.16
    699: 67455152 syms, dict. size 6761814, 17.3918 bits/sym, o0e 146645436 bytes
    Read 67455152 of 67455152 symbols, start 0.0000
    Common prefix scan 0 - 672e55, score[28813] = 17.67
    700: 67398964 syms, dict. size 6789521, 17.4150 bits/sym, o0e 146719226 bytes
    Read 67398964 of 67398964 symbols, start 0.0000
    Common prefix scan 0 - 679a90, score[30000] = 10.68
    701: 67346620 syms, dict. size 6818291, 17.4378 bits/sym, o0e 146797109 bytes
    Read 67346620 of 67346620 symbols, start 0.0000
    Common prefix scan 0 - 680af2, score[23907] = 6.36
    702: 67297017 syms, dict. size 6841150, 17.4581 bits/sym, o0e 146859523 bytes
    Read 67297017 of 67297017 symbols, start 0.0000
    Common prefix scan 0 - 68643d, score[26106] = 3.40
    703: 67247818 syms, dict. size 6866107, 17.4791 bits/sym, o0e 146928699 bytes
    Read 67247818 of 67247818 symbols, start 0.0000
    Common prefix scan 0 - 68c5ba, score[24060] = 1.53
    704: 67196545 syms, dict. size 6888892, 17.5000 bits/sym, o0e 146992749 bytes
    Read 67196545 of 67196545 symbols, start 0.0000
    Common prefix scan 0 - 691ebb, score[25421] = 0.50
    705: 67142588 syms, dict. size 6912767, 17.5222 bits/sym, o0e 147061003 bytes
    Read 67142588 of 67142588 symbols, start 0.0000
    Common prefix scan 0 - 697bfe, score[4791] = 0.50
    706: 67122917 syms, dict. size 6916976, 17.5288 bits/sym, o0e 147072985 bytes
    Read 67122917 of 67122917 symbols, start 0.0000
    Common prefix scan 0 - 698c6f, score[1785] = 0.50
    707: 67112062 syms, dict. size 6918473, 17.5321 bits/sym, o0e 147077186 bytes
    Read 67112062 of 67112062 symbols, start 0.0000
    Common prefix scan 0 - 699248, score[870] = 0.50
    708: 67103428 syms, dict. size 6919196, 17.5346 bits/sym, o0e 147079227 bytes
    Read 67103428 of 67103428 symbols, start 0.0000
    Common prefix scan 0 - 69951b, score[587] = 0.50
    709: 67098895 syms, dict. size 6919704, 17.5360 bits/sym, o0e 147080652 bytes
    Read 67098895 of 67098895 symbols, start 0.0000
    Common prefix scan 0 - 699717, score[252] = 0.50
    710: 67097073 syms, dict. size 6919915, 17.5365 bits/sym, o0e 147081243 bytes
    Read 67097073 of 67097073 symbols, start 0.0000
    Common prefix scan 0 - 6997ea, score[102] = 0.50
    711: 67094870 syms, dict. size 6920009, 17.5371 bits/sym, o0e 147081498 bytes
    Read 67094870 of 67094870 symbols, start 0.0000
    Common prefix scan 0 - 699848, score[90] = 0.50
    712: 67093879 syms, dict. size 6920097, 17.5374 bits/sym, o0e 147081751 bytes
    Read 67093879 of 67093879 symbols, start 0.0000
    Common prefix scan 0 - 6998a0, score[60] = 0.50
    713: 67093211 syms, dict. size 6920156, 17.5376 bits/sym, o0e 147081922 bytes
    Read 67093211 of 67093211 symbols, start 0.0000
    Common prefix scan 0 - 6998db, score[43] = 0.50
    714: 67093010 syms, dict. size 6920199, 17.5377 bits/sym, o0e 147082046 bytes
    Read 67093010 of 67093010 symbols, start 0.0000
    Common prefix scan 0 - 699906, score[8] = 0.50
    715: 67092977 syms, dict. size 6920207, 17.5377 bits/sym, o0e 147082069 bytes
    Read 67092977 of 67092977 symbols, start 0.0000

    Run time 8110.239 seconds.

    GLZAencode html.txt.glzc html.txt.glze
    cap encoded 1, UTF8 compliant 0
    Read 67092977 symbols including 6920207 definition symbols
    Parsed 39639643 level 0 symbols
    use_mtf 1, mcl 24 mrcl 22
    Encoded 39639643 level 1 symbols
    Compressed file size: 98578454 bytes, dictionary size: 6807035 symbols
    elapsed time = 26.875000 seconds.

    GLZAdecode html.txt.glze html.txt.glzd
    646224790

  4. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (15th July 2015)

  5. #694
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    GLZAdecode still running after 65 hours (17% CPU, peak memory 365,832K)
    Sportman, I really appreciate you reporting problems you encounter. It's a little embarrassing that you keep finding problems, so I apologize that my coding and/or test suite are not quite adequate.

    I noticed the 17% CPU usage. Does your computer have 12 cores? I think there is a way to read the number of cores, so it might be a good idea to add that sometime so I can tie up all 12 for faster compression.

    Your html log file is interesting. Except for the crash (which is obviously a critical problem), it looks like the changes in v0.3 vs. v0.2 help more than average. Compression is a little more than 10% faster and the compressed file is a little more than 2% smaller, so that is encouraging. Decompression should be a little faster too, but it needs to work first.

    There aren't many things that can cause an infinite loop in the decoder, but I did find a blatant code error (literally a "while (1);" that was left in the code) that is not likely to be executed, but perhaps you encountered it because of the change in the compression scoring formula. I changed the code to print a message instead of entering the infinite loop if that was the cause. I also added a couple of other infinite loop checks in case data got messed up prior entering the loop.

    Please try the following:
    1. The attached decoder and let me know if anything changes in the console output. If the console doesn't update for more than a couple of seconds, it is still stuck.
    2. If step 1 fails to work, try the same html.txt.glzc file with v0.2 of GLZAencode and GLZAdecode.
    3. If step 1 has no change in console output and step 2 works, try GLZAdecode -t1 html.txt.glze html.txt.glzd.
    Attached Files Attached Files

  6. #695
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Does your computer have 12 cores?
    6 cores/12 threads

    I get now a crash with this as console output:
    GLZAdecode html.txt.glze html.txt.glzd
    MTFG CAP QUEUE ERROR FIXED

  7. #696
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    MTFG CAP QUEUE ERROR FIXED
    Hopefully this version will work.
    Attached Files Attached Files

  8. The Following User Says Thank You to Kennon Conrad For This Useful Post:

    Sportman (15th July 2015)

  9. #697
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Hopefully this version will work.
    GLZAdecode html.txt.glze html.txt.glzd
    Decompressed 967021395 bytes in 9594 msec

    Compare OK!

  10. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (15th July 2015)

  11. #698
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.3a

    This is the same as v0.3 except the bug fix in GLZAdecode for the mtf queue when the symbol is preceded by the capital encode character and the first 192 queue entries do not start with a character between a and z.
    Attached Files Attached Files

  12. The Following User Says Thank You to Kennon Conrad For This Useful Post:

    surfersat (18th July 2015)

  13. #699
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.3b

    The attached .zip file has the same files as GLZA v0.3a plus GLZAcompressFast.exe and GLZAcompressFast.c.
    GLZACompressFast can be used in place of GLZACompress for faster compression of text at the cost of compression ratio. Instead of recursively ranking suffixes, it does an inline analysis (first to last) and tries to pick the best string to deduplicate or do nothing and move to the next symbol. After each deduplication, it updates the suffix tree. The parser is pretty lame (no look ahead) and overlaps are ignored, except there is a preprocessor for runs of the same symbol. For latin(?) text, there are also preprocessors for strings starting with line feeds and spaces.

    Comparision of compressing enwik9 with GLZA, LZHAM, LzTurbo, and PLZMA, the last three with options giving the best compression according to LTCB:

    GLZA(Fast): 180,243,710 bytes in 786 seconds, 11,395 MB RAM
    LZHAM: 200,069,116 bytes in 825 seconds, 4,642 MB RAM
    LzTurbo: 193,605,881 bytes in 1411 seconds, 13,440 MB RAM
    PLZMA: 193,240,163 bytes in 7326 seconds, 9,630 MB RAM

    For the above test, only LZHAM compression was multithreaded (by default, 8 threads).

    For enwik8:

    GLZA(Fast): 22,021,734 bytes in 58 seconds, 1,312 MB RAM
    LZHAM: 24,711,438 bytes in 56 seconds, 889 MB RAM
    LzTurbo: 24,356,021 bytes in 90 seconds, 2,238 MB RAM
    PLZMA: 24,206,571 bytes in 89 seconds, 1,905 MB RAM

    Files produced by GLZACompressFast tend to take slightly longer to decompress than those produced by GLZACompress, usually by about 2%.

    GLZACompress takes 4,156 seconds to compress enwik9, using up to 8 threads, averaging about 5.3 threads (vs. 786 seconds for single-threaded GLZACompressFast).

    Compression can be bogged down when a file has a lot of long overlaps. It's something that can be fixed but will take time. The parsing can be improved too, etc.
    Attached Files Attached Files

  14. The Following 2 Users Say Thank You to Kennon Conrad For This Useful Post:

    Bulat Ziganshin (19th November 2015),Paul W. (17th November 2015)

  15. #700
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Wow!

    Quote Originally Posted by Kennon Conrad View Post
    Instead of recursively ranking suffixes, it does an inline analysis (first to last) and tries to pick the best string to deduplicate or do nothing and move to the next symbol. After each deduplication, it updates the suffix tree.
    Could you explain the "inline analysis", and how it works?

  16. #701
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    For each symbol in the file, starting at the first symbol, it looks at all the suffixes that start at that position and scores them according to a formula related to profit. If the best one exceeds a certain threshold, then it replaces all instances of that suffix with a new symbol and adds the symbol string to the end of the file. Then it moves to the next symbol in the file. The score is calculated by first calculating P = log2(repeats/file_symbols) - log2(Markov chain probability) and then S = (P - 1.4) / L, where L is the suffix length. It's maybe not the best but was the best I could do for now.
    Last edited by Kennon Conrad; 17th November 2015 at 03:04.

  17. The Following 2 Users Say Thank You to Kennon Conrad For This Useful Post:

    Bulat Ziganshin (19th November 2015),Paul W. (17th November 2015)

  18. #702
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    FWIW, I'd like to add a plugin for this to Squash, but the API (or lack thereof) is a bit intimidating. Do you have any plans on making this a bit more approachable from an API standpoint? Or perhaps I'm missing an easy-to-use API somewhere (it's a pretty big codebase…)?

  19. #703
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nemequ View Post
    FWIW, I'd like to add a plugin for this to Squash, but the API (or lack thereof) is a bit intimidating. Do you have any plans on making this a bit more approachable from an API standpoint? Or perhaps I'm missing an easy-to-use API somewhere (it's a pretty big codebase…)?
    I do not have any plans to make this more approachable, but most of my work on GLZA has not been planned. It seems like it shouldn't be too hard, mostly just a matter of replacing the I/O file management code (fopen, fclose, fread, fwrite) with buffers that are provided by a calling routine. What is the most common way to provide an API in the compression world? Is it by providing source code that includes callable functions, by providing object files that include callable functions, with a dll, or some other way? I don't know much about plug-ins but would be willing to try if I know what is useful.

  20. #704
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I do not have any plans to make this more approachable, but most of my work on GLZA has not been planned. It seems like it shouldn't be too hard, mostly just a matter of replacing the I/O file management code (fopen, fclose, fread, fwrite) with buffers that are provided by a calling routine. What is the most common way to provide an API in the compression world? Is it by providing source code that includes callable functions, by providing object files that include callable functions, with a dll, or some other way? I don't know much about plug-ins but would be willing to try if I know what is useful.
    For open source code the most common thing would be to provide source code which includes some relatively simple functions in a header. As far as what those functions look like, there are three relatively common APIs. I've been working on some documentation for Squash which should help explain them (there is also the User Guide, which is a more introductory document). Some of the links and formatting are broken in those links; they are designed to go in the Squash Reference Manual and Doxygen's markdown syntax is slightly different from what GitHub expects, but you should get the idea.

    The most flexible API is definitely a zlib-style steaming API, but it's also the hardest to implement.

    Based on what you wrote above, I'm guessing it would be relatively easy to provide what I call a splicing API; basically you just provide one function for each compression and decompression which takes callbacks that look a lot like fread and fwrite. The copy of CRUSH distributed with Squash (see crush.c and crush.h) is a decent example of this approach… the original version was just a command-line program which used fread and fwrite, and I modified it to take callbacks then used that to create the Squash plugin.

    The third API is just a simple buffer-to-buffer function. You provide it with all the input and enough room to store all the output, and just compresses or decompresses the whole thing. Obviously there are issues with memory consumption if you're dealing with lots of data (Squash solves this to some extent by using mapped files, but it's not possible everywhere).

  21. The Following User Says Thank You to nemequ For This Useful Post:

    Kennon Conrad (17th November 2015)

  22. #705
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    The streaming API sounds interesting, but I will have to learn more about it. The buffer to buffer thing sounds easiest, but in addition to the memory issue, it doesn't seem so good to wait to send the data during decompression until the whole file is decoded. I suppose it depends on where the data is going.

  23. #706
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I added GLZAcompressFast to LTCB. http://mattmahoney.net/dc/text.html#1654

  24. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Kennon Conrad (19th November 2015)

  25. #707
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr. Conrad,

    I have source file and another files with these extensions: glzc, glzd (size equal to source), glze, glzf (source size +1 byte). How do I interpret these files? I use GLZA 0.3b.

    Best regards, FatBit

  26. #708
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by FatBit View Post
    Dear Mr. Conrad,

    I have source file and another files with these extensions: glzc, glzd (size equal to source), glze, glzf (source size +1 byte). How do I interpret these files? I use GLZA 0.3b.

    Best regards, FatBit
    The compression process is done in three stages so I can test variations on any stage without repeating the other ones. Glze is the output of stage one
    , glzc is the output on stage two and .glzc is the output of the encoder. .glzd is the output of the decoder and the contents should match the original file.

  27. #709
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Do you know how much memory decompression is using?

  28. #710
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Paul W. View Post
    Do you know how much memory decompression is using?
    Just a little more than the previous version. The decompressor is the same but the grammar is a little less effective. Sorry but I don't have access to the exact numbers right now

  29. The Following User Says Thank You to Kennon Conrad For This Useful Post:

    Paul W. (19th November 2015)

  30. #711
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr. Conrad,

    thank you for your reply. I prepared small benchmark - please follow topic "Text strings coding chemical structures".

    Best regards, FatBit

  31. The Following User Says Thank You to FatBit For This Useful Post:

    Kennon Conrad (21st November 2015)

  32. #712
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Does anybody have GLZA 0.3b working under Linux with gcc?

    I've got it compiled and working for most files, but on some it gets stuck in GLZAcompress after some number of scans. This could be a problem with my setup, so if somebody could try it on bib from Calgary, that might tell me something. (Kennon's Windows build works fine for me under Windows 8.1 on the same machine, which has a quad-core AMD A6 and 12GB of RAM.)

  33. #713
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    32 bit GLZA

    Fatbit asked about a 32-bit build of GLZA on another thread so I tried making a 32-bit build and ran into a couple problems. I fixed the problems by creating slightly modified source code on a couple of files and will integrate changes into the mainline for the next (cleanup) release. So here are 32-bit executables. I didn't test much but think it probably only works on files with size up to about 250 MB.
    Attached Files Attached Files
    Last edited by Kennon Conrad; 29th January 2016 at 20:25.

  34. #714
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    I confirm that version GLZA 0.3b 32 bit works under W7 X32 Prof Czech.

    Best regards,

    FatBit

  35. #715
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr. Conrad,

    unfortunately it looks like that program "hangs" on certain line (or at least there is big execution time step). Data files and logs enclosed.

    Best regards,

    FaBit
    Attached Files Attached Files

  36. #716
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Dear Fatbit,

    I fixed the problem and replaced the .zip file above.

    Best Regards,

    Kennon

  37. #717
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr. Conrad,

    Thank you for problems solving. I confirm, that both sent examples work on Win 7 x32 Prof. Czech. It looks like they would also work on Win XP x32 Prof. Czech if I would have enough memory (it is not my case).

    Solved, Let's test!

    Best regards,

    FatBit

  38. #718
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr. Conrad,

    some quick experiences:

    1. Program does not process -m switch (fopen error -'-m100'). I tried -m7, -m 7, -m70 , -m 70, -m100, -m 100 without effect. Default (nothig) works.
    2. Content file sorting affect results (see attachment).

    I have to mention that program does not "understand" chemistry. However I also do not.

    Best regards,

    FatBit
    Attached Files Attached Files

  39. #719
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by FatBit View Post
    Program does not process -m switch (fopen error -'-m100'). I tried -m7, -m 7, -m70 , -m 70, -m100, -m 100 without effect. Default (nothig) works.
    Can you post more details such as the command line and file? I run "GLZAcompress32 Data1.glzf Data1.glzc" and get a dictionary with 11,356 entries. I run "GLZAcompress32 -m100 Data1.glzf Data1.glzc" and get a dictionary with 1,817 entries (according to GLZAencode32).

  40. #720
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Sorry, I am scatterbrain, too fast reading. I used cmd

    Rem SMILE format file processing
    Set DictionarySize=100
    GLZAFormat32.exe %1 %1.glzf
    GLZACompress32.exe %1.glzf %1.glzc
    GLZAEncode32.exe %1.glzc %1.glze
    GLZAEncode32.exe -v -m%DictionarySize% %1.glzc %1.glze > %1.SymbolList.asc

    not

    Rem SMILE format file processing
    Set DictionarySize=100
    GLZAFormat32.exe %1 %1.glzf
    GLZACompress32.exe -m%DictionarySize% %1.glzf %1.glzc
    GLZAEncode32.exe %1.glzc %1.glze
    GLZAEncode32.exe -v %1.glzc %1.glze > %1.SymbolList.asc

    Do you think (if you use freeware development environment) it would be possible to zip all to one file and to send me? As we discussed previously I will have to modify line with patterns recognition.
    I am not familiar with program compiling, last tens years I have "produced" office macros etc.

    FatBit apologizes + Best regards

Page 24 of 29 FirstFirst ... 142223242526 ... LastLast

Similar Threads

  1. Replies: 4
    Last Post: 2nd December 2012, 02:55
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 07:37
  3. M03 alpha
    By michael maniscalco in forum Data Compression
    Replies: 6
    Last Post: 10th October 2009, 00:31
  4. PIM 2.00 (alpha) is here!!!
    By encode in forum Forum Archive
    Replies: 46
    Last Post: 14th June 2007, 19:27
  5. PIM 2.00 (alpha) overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 8th June 2007, 13:41

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •