Page 12 of 29 FirstFirst ... 2101112131422 ... LastLast
Results 331 to 360 of 849

Thread: Tree alpha v0.1 download

  1. #331
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Memory mapping seems like a good idea, but from what I am seeing it is not available with standard MinGW 32 bit gcc. But then again, the time to move up from 32 bit gcc under Windows may be approaching.

    I don't know a whole lot about how the Windows OS handles I/O. I do know that writing one giant buffer doesn't work so well. Tree did that originally and had a pretty long stall at the end to flush the buffer, much like I seem to be seeing with LzTurbo.
    I'm not very familiar with MinGW, but memory-mapping should be part of the Windows API, so if you can call Windows API functions, it should just work. The compiler doesn't need to support it. I've built and run code that uses mmap in Cygwin and it worked fine.

    I/O isn't supposed to be difficult. All that's really important is that buffering exists at various levels and reads and writes go to and from the device in blocks. If you call system calls directly, you should read and write in chunks (ideally multiples of the system's block size, which is always a power of 2); the reason is that system calls are expensive and too many will cause a performance hit. If you use stdio, you shouldn't have to think about this. Buffering a huge amount and writing it all at once uses lots of memory and it cuts off the opportunity to pipeline (it takes time to physically write data to the device and this can happen in parallel with computation if you start writing while the computation is still underway).

  2. #332
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nburns View Post
    I'm not very familiar with MinGW, but memory-mapping should be part of the Windows API, so if you can call Windows API functions, it should just work. The compiler doesn't need to support it. I've built and run code that uses mmap in Cygwin and it worked fine.

    I/O isn't supposed to be difficult. All that's really important is that buffering exists at various levels and reads and writes go to and from the device in blocks. If you call system calls directly, you should read and write in chunks (ideally multiples of the system's block size, which is always a power of 2); the reason is that system calls are expensive and too many will cause a performance hit. If you use stdio, you shouldn't have to think about this. Buffering a huge amount and writing it all at once uses lots of memory and it cuts off the opportunity to pipeline (it takes time to physically write data to the device and this can happen in parallel with computation if you start writing while the computation is still underway).
    I saw this on minGW's site:http://www.mingw.org/node/21.

    Yes, I/O shouldn't be difficult. That's why the 7 second stall when decompressing with LzTurbo surprises me.

  3. #333
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I was testing tree v0.12 in 64 bit Linux. On a small text file (enwik6, 1 MB), TreeCompress.exe under Wine 1.6 gives an out of memory error. Next I compiled with gcc 4.8.2 -O3 and I get a segmentation fault. Then I tried options -O3 -m32 and finally it runs, but enwik8 gives fast but poor compression and enwik9 gives an error (symbol limit exceeded). Here is the output:
    Code:
    matt@matt-Latitude-E6510:~$ testtree enwik8
    TreePreEncode enwik8 enwik8.tpre
    Pre and capital encoding 100000000 bytes
    Wrote 1 byte header and 103809101 encoded bytes in 450.853 seconds.
    
    real	0m0.454s
    user	0m0.333s
    sys	0m0.120s
    TreeCompress enwik8.tpre enwik8.tcom
    Allocated 1778384896 bytes for data processing
    Read 103809102 byte input file
    cap encoded: 1, UTF8 compliant 1
    Found 103430933 symbols, 0 defines, maximum unicode value 0x28b46
    1: 103430933 syms, dict. size 0, 4.9009 bits/sym, o0e 63362903 bytes
    Common prefix scan 34/34 79.0 - 2ffff.2ffff          
    Run time 1034 seconds.
    
    real	0m1.038s
    user	0m0.888s
    sys	0m0.148s
    TreeBitEncode enwik8.tcom enwik8.tree
    cap encoded: 1, UTF8_compliant 1
    Read 103430933 symbols including 0 definition symbols
    Encoded 103430933 level 1 symbols                     
    Compressed file size: 78293312 bytes, dictionary size: 6039 symbols
    elapsed time = 2181.430000 seconds.
    
    real	0m2.376s
    user	0m1.989s
    sys	0m0.196s
    TreeDecode enwik8.tree enwik8.out
    Decompressed 100000000 bytes in 11178482 msec
    
    real	0m5.594s
    user	0m11.080s
    sys	0m0.100s
     97660 -rw-rw-r-- 1 matt matt 100000000 Sep 30 15:58 enwik8.out
    101380 -rw-rw-r-- 1 matt matt 103809102 Sep 30 15:57 enwik8.tcom
    101384 -rw-rw-r-- 1 matt matt 103809102 Sep 30 15:57 enwik8.tpre
     76460 -rw-rw-r-- 1 matt matt  78293312 Sep 30 15:58 enwik8.tree
    a1fa5ffddb56f4953e226637dabbb36a  enwik8
    a1fa5ffddb56f4953e226637dabbb36a  enwik8.out
    matt@matt-Latitude-E6510:~$ testtree enwik9
    TreePreEncode enwik9 enwik9.tpre
    Pre and capital encoding 1000000000 bytes
    Wrote 1 byte header and 1041507612 encoded bytes in 6221.491 seconds.
    
    real	0m24.686s
    user	0m4.428s
    sys	0m1.795s
    TreeCompress enwik9.tpre enwik9.tcom
    Allocated 1778384896 bytes for data processing
    Read 1041507613 byte input file
    cap encoded: 1, UTF8 compliant 1
    Found 1039028503 symbols, 0 defines, maximum unicode value 0x28b4e
    1: 1039028503 syms, dict. size 0, 4.9718 bits/sym, o0e 645730267 bytes
    Common prefix scan 634/634 30fb.0 - 2ffff.2ffff          
    Run time 6834 seconds.
    
    real	0m20.720s
    user	0m5.751s
    sys	0m1.085s
    TreeBitEncode enwik9.tcom enwik9.tree
    cap encoded: 1, UTF8_compliant 1
    ERROR - symbol limit of 250000000 symbols exceeded
    
    real	0m4.531s
    user	0m2.404s
    sys	0m0.605s
    TreeDecode enwik9.tree enwik9.out
    fopen error file 'enwik9.tree' not found
    
    real	0m1.422s
    user	0m0.006s
    sys	0m0.000s
          0 -rw-rw-r-- 1 matt matt          0 Sep 30 15:59 enwik9.out
    1017104 -rw-rw-r-- 1 matt matt 1041507613 Sep 30 15:59 enwik9.tcom
    1017104 -rw-rw-r-- 1 matt matt 1041507613 Sep 30 15:59 enwik9.tpre
    e206c3450ac99950df65bf70ef61a12d  enwik9
    d41d8cd98f00b204e9800998ecf8427e  enwik9.out
    rm: cannot remove ‘enwik9.tree’: No such file or directory
    Here is the test script.

    Code:
    echo TreePreEncode $1 $1.tpre
    time TreePreEncode $1 $1.tpre
    echo TreeCompress $1.tpre $1.tcom
    time TreeCompress $1.tpre $1.tcom
    echo TreeBitEncode $1.tcom $1.tree
    time TreeBitEncode $1.tcom $1.tree
    echo TreeDecode $1.tree $1.out
    time TreeDecode $1.tree $1.out
    ls -las $1.*
    md5sum $1 $1.out
    rm $1.tpre $1.tcom $1.tree $1.out
    Also, times are not calculated correctly by the programs. When I compile I get warnings about printf format errors (long passed as "%u") which matters because in 64 bit gcc, long is 64 bits.

  4. #334
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Let me get back to you. I just installed minGW64 last night and had to change some variables to 64 bits to get the code running.

  5. #335
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I saw this on minGW's site:http://www.mingw.org/node/21.
    I've seen that, too. With Cygwin you can use the POSIX APIs, so you can call mmap like in Linux. With MinGW, you'll have to find the native Win32 version and call that instead.

  6. #336
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nburns View Post
    I've seen that, too. With Cygwin you can use the POSIX APIs, so you can call mmap like in Linux. With MinGW, you'll have to find the native Win32 version and call that instead.
    Now that I have the 64-bit compiler, I think I may have more options. But I'm not sure it would have much impact on TreeDecode since it already has a separate thread that reads the decoded symbols from a circular buffer, looks up the dictionary string and sends the data to output via ~1 MB ping-pong buffers.

  7. #337
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I was testing tree v0.12 in 64 bit Linux. On a small text file (enwik6, 1 MB), TreeCompress.exe under Wine 1.6 gives an out of memory error. Next I compiled with gcc 4.8.2 -O3 and I get a segmentation fault. Then I tried options -O3 -m32 and finally it runs, but enwik8 gives fast but poor compression and enwik9 gives an error (symbol limit exceeded).

    Also, times are not calculated correctly by the programs. When I compile I get warnings about printf format errors (long passed as "%u") which matters because in 64 bit gcc, long is 64 bits.
    Thanks for the detailed problem report. I am sorry trying to build a 64-bit executable didn't go very well. The attached version of TreeCompress has updated code and works when a 64-bit executable is built with MinGW64 and works with MinGW. I can't build 32-bit executables with MinGW64, but I think it is a setup problem. I do not know if it will work under Wine - I used %Iu in printf's. The pointers could use some clean-up work and this version uses the same memory size for 32 or 64 bit code. The MAX_MEMORY_USAGE const can be increased before making a 64-bit build if you want to try more memory, which does seem to work. The results are not exactly the same, probably because memory structure sizes have changed even though overall memory allocation is the same for the 32 and 64 bit executables.
    Attached Files Attached Files

  8. #338
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    462
    Thanks
    48
    Thanked 169 Times in 122 Posts
    Thanks Sportman.A few numbers say more than 1000 words.

  9. #339
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I compiled the new TreeCompress64 in Linux and it works.
    enwik8 -> 21,974,704, 0.4+7299+3.4 sec, 1.8 sec. (2.67 GHz M620).

  10. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    Kennon Conrad (2nd October 2014),surfersat (2nd October 2014)

  11. #340
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Input:
    1,000,000,000 - enwik9

    Ouput:

    Harddisk:
    269,503,577 bytes, 20.647 sec. - 9.669 sec., Lzturbo -32 -p1
    250,730,516 bytes, 1083.884 sec. - 11.688 sec., Lzturbo -29 -p1
    202,241,389 bytes, 1124.972 sec. - 12.045 sec., Lzturbo -39 -p1
    193,605,881 bytes, 1317.063 sec. - 19.420 sec., Lzturbo -49 -p1
    177,321,380 bytes, xxxx.xxx sec. - 13.581 sec., TreeComp v0.12 -t1

    SSD:
    269,503,577 bytes, 19.497 sec. - 3.622 sec., Lzturbo -32 -p1
    250,730,516 bytes, 1073.668 sec. - 3.672 sec., Lzturbo -29 -p1
    202,241,389 bytes, 1114.174 sec. - 4.169 sec., Lzturbo -39 -p1
    193,605,881 bytes, 1305.626 sec. - 11.493 sec., Lzturbo -49 -p1
    177,321,380 bytes, xxxx.xxx sec. - 11.215 sec., TreeComp v0.12 -t1

    RAM DISK:
    269,503,577 bytes, 19.235 sec. - 1.981 sec., Lzturbo -32 -p1
    250,730,516 bytes, 1069.845 sec. - 1.694 sec., Lzturbo -29 -p1
    202,241,389 bytes, 1111.677 sec. - 2.477 sec., Lzturbo -39 -p1
    193,605,881 bytes, 1302.485 sec. - 9.624 sec., Lzturbo -49 -p1
    177,321,380 bytes, xxxx.xxx sec. - 11.240 sec., TreeComp v0.12 -t1

    cleanmem before every run.

  12. The Following 3 Users Say Thank You to Sportman For This Useful Post:

    Kennon Conrad (2nd October 2014),Matt Mahoney (2nd October 2014),surfersat (2nd October 2014)

  13. #341
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Input:
    1,000,000,000 bytes - enwik9

    Output:

    Harddisk:
    269,503,577 bytes, 13.549 sec. - 9.404 sec., Lzturbo -32 -p2
    250,730,516 bytes, 1083.920 sec. - 11.536 sec., Lzturbo -29 -p2
    202,241,389 bytes, 1126.518 sec. - 12.110 sec., Lzturbo -39 -p2
    193,605,881 bytes, 1316.625 sec. - 19.524 sec., Lzturbo -49 -p2
    177,321,380 bytes, xxxx.xxx sec. - 10.681 sec., TreeComp v0.12

    SSD:
    269,503,577 bytes, 11.220 sec. - 2.377 sec., Lzturbo -32 -p2
    250,730,516 bytes, 1074.534 sec. - 3.560 sec., Lzturbo -29 -p2
    202,241,389 bytes, 1114.699 sec. - 4.213 sec., Lzturbo -39 -p2
    193,605,881 bytes, 1305.269 sec. - 11.452 sec., Lzturbo -49 -p2
    177,321,380 bytes, xxxx.xxx sec. - 7.465 sec., TreeComp v0.12

    RAM DISK:
    269,503,577 bytes, 11.060 sec. - 1.269 sec., Lzturbo -32 -p2
    250,730,516 bytes, 1071.082 sec. - 1.705 sec., Lzturbo -29 -p2
    202,241,389 bytes, 1114.523 sec. - 2.398 sec., Lzturbo -39 -p2
    193,605,881 bytes, 1303.752 sec. - 9.625 sec., Lzturbo -49 -p2
    177,321,380 bytes, xxxx.xxx sec. - 7.436 sec., TreeComp v0.12

  14. The Following 2 Users Say Thank You to Sportman For This Useful Post:

    Kennon Conrad (5th October 2014),Paul W. (13th October 2014)

  15. #342
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Tested tree 0.12 compiled with gcc 4.8.2 -O3 on enwik9 in 64 bit Ubuntu. TreeCompress gave a segmentation fault after 2.6 days but the output was still OK.

    enwik9.tpre -> 1,041,507,613, 4.4 sec
    enwik9.tcom -> 371,204,373, 226246 sec, gives segmentation fault
    enwik9.tree -> 177,542,372, 29 sec (note different size)
    enwik9.out -> 16.8 sec (verifies OK)

    http://mattmahoney.net/dc/text.html#1775

  16. #343
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Tested tree 0.12 compiled with gcc 4.8.2 -O3 on enwik9 in 64 bit Ubuntu. TreeCompress gave a segmentation fault after 2.6 days but the output was still OK.

    enwik9.tpre -> 1,041,507,613, 4.4 sec
    enwik9.tcom -> 371,204,373, 226246 sec, gives segmentation fault
    enwik9.tree -> 177,542,372, 29 sec (note different size)
    enwik9.out -> 16.8 sec (verifies OK)

    http://mattmahoney.net/dc/text.html#1775
    Is this the same build that ran okay for enwik8? If so, that is odd.

    I have not seen this problem on Windows. Since the output gets saved periodically during compression, there are two lines of code at the end of the file that get executed for the first time when the program finishes, both at the end of TreeCompress.c. The first is the free(char_buffer) command. The second is the fprintf for the execution time. I suspect the free call is what is causing the segmentation fault, but cannot be sure without more information or installing Ubuntu. If you still have the enwik9.tcom file, it would be interesting to try running it through TreeCompress again with and without the free command commented out.

    The different/larger size was caused by the program having fewer tree nodes available since pointers are now 8 bytes instead of 4. It tries compensating by building the common prefix tree in smaller pieces, but in this case, there probably weren't enough nodes available to fit some of the individual pieces (for instance, it may have not been able to fit all the common prefixes starting with two spaces).

  17. #344
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 660 Times in 354 Posts
    memalloc problems are greatly spotted by the valgrind

  18. #345
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I didn't look at the source code but the end of output for TreeCompress looked like this:
    Code:
    Common prefix scan 16/19 4d6238.0 - 57d925.78b78b          
    Created 4791050 of 21753068 string nodes
    Common prefix scan 17/19 57d926.0 - 641ed3.78b78b          
    Created 4773095 of 21753068 string nodes
    Common prefix scan 18/19 641ed4.0 - 731f64.78b78b          
    Created 4756924 of 21753068 string nodes
    Common prefix scan 19/19 731f65.0 - 78b78b.78b78b          
    Created 1638985 of 21753068 string nodes
    Common prefix scan 19/19 731f65.0 - 78b78b.78b78b, score[30000] = 22.65
    /home/matt/bin/testtree: line 4: 19521 Segmentation fault      (core dumped) TreeCompress $1.tpre $1.tcom
    
    real	3770m46.907s
    user	3757m50.924s
    sys	2m20.234s
    I compiled the new version of TreeCompress for enwik9. I didn't test enwik8 again with it.

    It seems like since your program is designed for 32 bits you could use ints instead of pointers so the 64 bit version doesn't take more space.
    Last edited by Matt Mahoney; 7th October 2014 at 00:01.

  19. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Kennon Conrad (7th October 2014)

  20. #346
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I didn't look at the source code but the end of output for TreeCompress looked like this:
    Code:
    Common prefix scan 16/19 4d6238.0 - 57d925.78b78b          
    Created 4791050 of 21753068 string nodes
    Common prefix scan 17/19 57d926.0 - 641ed3.78b78b          
    Created 4773095 of 21753068 string nodes
    Common prefix scan 18/19 641ed4.0 - 731f64.78b78b          
    Created 4756924 of 21753068 string nodes
    Common prefix scan 19/19 731f65.0 - 78b78b.78b78b          
    Created 1638985 of 21753068 string nodes
    Common prefix scan 19/19 731f65.0 - 78b78b.78b78b, score[30000] = 22.65
    /home/matt/bin/testtree: line 4: 19521 Segmentation fault      (core dumped) TreeCompress $1.tpre $1.tcom
    
    real    3770m46.907s
    user    3757m50.924s
    sys    2m20.234s
    I compiled the new version of TreeCompress for enwik9. I didn't test enwik8 again with it.

    It seems like since your program is designed for 32 bits you could use ints instead of pointers so the 64 bit version doesn't take more space.
    Thanks, that's all I needed to understand what happened. The program crashed when building the new dictionary entries data structure for one-pass substitution of new dictionary entries because the pointers in the structure are twice as big as before.

    I'm not sure that 64 bit programs are guaranteed to allocate memory with addresses <4G. I have changed most of the structure pointers to structure indexes, which gets the memory usage back down but requires an address calculation when the index is used.

    I will try to post an updated 64-bit version, but it won't be right away. I need to make a few more changes to make the process identical.

  21. #347
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Thanks, that's all I needed to understand what happened. The program crashed when building the new dictionary entries data structure for one-pass substitution of new dictionary entries because the pointers in the structure are twice as big as before.

    I'm not sure that 64 bit programs are guaranteed to allocate memory with addresses <4G. I have changed most of the structure pointers to structure indexes, which gets the memory usage back down but requires an address calculation when the index is used.
    I don't see any way around that when building for 64-bit. Pointers will be 8 bytes.

    I will try to post an updated 64-bit version, but it won't be right away. I need to make a few more changes to make the process identical.

  22. #348
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nburns View Post
    I don't see any way around that when building for 64-bit. Pointers will be 8 bytes.
    The problem with creating a 64-bit version of TreeCompress v0.12 is that it won't produce identical results without increasing RAM usage and/or run time. The main reason for going to 64-bit code was to provide more RAM for compression. One of these days though, I need to try making a 64-bit version of TreeDecode. Using 64 bit values would make the bit stream handling much easier.
    Last edited by Kennon Conrad; 8th October 2014 at 09:15.

  23. #349
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    The problem with creating a 64-bit version of TreeCompress v0.12 is that it won't produce identical results without increasing RAM usage and/or run time. The main reason for going to 64-bit code was to provide more RAM for compression. One of these days though, I need to try making a 64-bit version of TreeDecode. Using 64 bit values would make the bit stream handling much easier.
    What's wrong with replacing pointers with 32-bit array indexes? That won't work if the array has more than 4G elements, but how often does that happen, anyway?

    Array indexing might be slower than using pointers directly, but I'd try it before assuming it will be too slow. The machine code honestly might change less than you think and come out almost equivalent. Even if your assumptions are on target, it might cost less than you expect.
    Last edited by nburns; 9th October 2014 at 14:39.

  24. #350
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nburns View Post
    What's wrong with replacing pointers with 32-bit array indexes? That won't work if the array has more than 4G elements, but how often does that happen, anyway?

    Array indexing might be slower than using pointers directly, but I'd try it before assuming it will be too slow. The machine code honestly might change less than you think and come out almost equivalent. Even if your assumptions are on target, it might cost less than you expect.
    My only issue is that I think slowing down 32 bit code so that 64 bit code (that won't run faster) can produce the same results is silly. The time penalty is not that much.

  25. #351
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    My only issue is that I think slowing down 32 bit code so that 64 bit code (that won't run faster) can produce the same results is silly. The time penalty is not that much.
    When AMD came out with the 64-bit extensions to x86, they were supposed to have other overdue improvements, like access to additional registers. So there may be other benefits to building for 64-bits.

  26. #352
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nburns View Post
    When AMD came out with the 64-bit extensions to x86, they were supposed to have other overdue improvements, like access to additional registers. So there may be other benefits to building for 64-bits.
    For either my A8-5500 or i7-4790K with TreeCompress and TreeDecode using minGW, execution time has been longer with 64 bit executables. There are extra calculations for addresses that I haven't been able to get around yet. For TreeDecode, all I really want is to be able to shift 64 bit registers, but I can't seem to get there without ending up with a lot more overhead from what appears to be relative addressing.

  27. #353
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    For either my A8-5500 or i7-4790K with TreeCompress and TreeDecode using minGW, execution time has been longer with 64 bit executables. There are extra calculations for addresses that I haven't been able to get around yet. For TreeDecode, all I really want is to be able to shift 64 bit registers, but I can't seem to get there without ending up with a lot more overhead from what appears to be relative addressing.
    That's possible. But it can be hard to tell where the problem really is without lots of digging and experimentation. The calculations happen on the CPU. Going out to memory is dramatically slower, so usually optimizing memory accesses is the biggest concern.

  28. #354
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Hi Kennon,

    I am mainly reading in this forum since many years.
    you have written one of the most exciting algorithms / programs of the last years in my opinion.
    It's something original and in it's own league performance-wise in what it does.
    Because of this, please focus in optimizing its strength instead of turning it to something we already have a dozen times.
    If you would be able to cut compression time by half (which shouldn't be realistic) and loose like up to 20% compression this algorithm wouldn't be anything interesting any more.

    Surely a fork of this with new ideas would be cool but please stick to this with the strengths it has. More compression, faster decompression and faster compression without loosing in one of the first two categories would be welcome.

    EDIT:
    By the way why didn't anyone move this thread to general yet?
    Last edited by Simon Berger; 12th October 2014 at 20:20.

  29. #355
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Simon Berger View Post
    Hi Kennon,

    I am mainly reading in this forum since many years.
    you have written one of the most exciting algorithms / programs of the last years in my opinion.
    It's something original and in it's own league performance-wise in what it does.
    Because of this, please focus in optimizing its strength instead of turning it to something we already have a dozen times.
    If you would be able to cut compression time by half (which shouldn't be realistic) and loose like up to 20% compression this algorithm wouldn't be anything interesting any more.

    Surely a fork of this with new ideas would be cool but please stick to this with the strengths it has. More compression, faster decompression and faster compression without loosing in one of the first two categories would be welcome.

    EDIT:
    By the way why didn't anyone move this thread to general yet?
    Thanks for the feedback. I do intend to continue to optimize Tree's strengths, but also to investigate improving compession speed. I am convinced significantly faster algorithms that can produce files with nearly as much compression are possible. You are right, a 20% loss in compression would cause Tree to lose it's best/most unique quality, so that is not interesting to me.

  30. #356
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    I found a bug in TreeBitEncode that causes decoding of small files to crash, so it has been fixed. TreeDecode has been modified to use smaller and more appropriately sized variables, decreasing RAM usage and improving decoding speed by 3-4%.

    Instead of making TreeCompress64 run like TreeCompress, but just slower, I went a completely different direction. TreeCompress64 is a very experimental version that merely represents a sidebranch of a path toward linear decision making. The intent is to provide an indication of how much the compression ratio will be impacted by making decisions using only partial data and using a new scoring formula more tailored to linear compression.

    Main functional changes:
    1. All input symbols are converted to Tree's internal 32-bit token codes when the input is read and converter back when the output is written rather than converting symbols on the fly when the common prefix trees are built. This causes the program to use significantly more RAM to store the data but allows the code to be simpler and improves execution speed.
    2. Maximum RAM usage is approximately 6x the input file size rather than about 1.7 GB maximum.
    3. Common prefix trees are not built in sections. Instead, the program builds as much of a common prefix tree as will fit in memory and makes tokenization decisions based on only the window of data that was used to build the tree.
    4. The scoring formula has been adjusted to promote high frequency strings more/long strings less than before.
    5. Token creation management only allows an average of about 3000 unusable symbols per cycle instead of 7500.
    6. Adjustments for two instance matches that are near each other have been eliminated.

    When compressing enwik9, the first 5 symbols created represent the following strings. The window size for the first decision cycle is about 19 MB. "C" is the capital encode symbol.

    1:
    </text>
    </revision>
    </page>
    <page>
    <title>C
    2:
    &quote;
    3:
    </comment>
    <text xml:space="preserve">
    4:
    Cz</timestamp>
    <contributor>
    <
    5.
    [http://www.

    One other note is that my results have not been 100% consistent. I think they may depend on the runtime libraries.

    Results for enwik8:
    TreeCompress => 21,976,316 bytes, encode 3916 sec., decode 0.78 sec.
    TreeParagraphs + TreeWords + TreeCompress => 22,075,700 bytes, encode 1906 sec., decode 0.76 sec.
    TreeCompress64 => 22,196,288 bytes, encode 931 sec., decode 0.76 sec.
    TreeParagraphs + TreeWords + TreeCompress64 => 22,304,524 bytes, encode 730 sec., decode 0.74 sec.

    Results for enwik9:
    TreeCompress => 177,340,072 bytes, encode 116,473 sec., decode 7.5 sec.
    TreeParagraphs + TreeWords + TreeCompress => 178,774,864 bytes, encode 36,652 sec., decode 7.3 sec.
    TreeCompress64 => 180,516,660 bytes, encode 30,147 sec., decode 7.6 sec.
    TreeParagraphs + TreeWords + TreeCompress64 => 180,941,504 bytes, encode 20,834 sec., decode 7.4 sec.

    TreeDecode.zip is 10,525 bytes.

    Overall results are 45% - 75% faster compression at the cost of 1% - 2% of compressed file size and more RAM for enwik9, but less RAM for enwik8. This is encouraging for single pass compression because it appears that making tokenization decisions based on a small initial window does not hurt compression too much. This is significant for the algorithm I have in mind because I don't have nearly enough RAM to fit all of enwik9's common prefix tree in a memory until a lot of token substitutions have been made.

    Do not try TreeCompress64 if your computer doesn't easily have 6x the file size available. It will pretty much freeze everything up.
    Attached Files Attached Files

  31. #357
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    A 64 bit shift should be faster with a 64 bit compile. The function shift() is 10 instructions in 32 bits and 3 instructions in 64 bit. I cheated a bit by making the shift amount the first parameter so it is already in RCX (at least in Windows). Otherwise there is an extra instruction to move it there.

    Code:
    C:\tmp>type a.c
    unsigned long long shift(int x, unsigned long long a) {
      return a << x;
    }
    
    C:\tmp>gcc -O3 -S -m32 a.c
    
    C:\tmp>type a.s
            .file   "a.c"
            .text
            .p2align 4,,15
            .globl  _shift
            .def    _shift; .scl    2;      .type   32;     .endef
    _shift:
            movl    4(%esp), %ecx
            movl    8(%esp), %eax
            movl    12(%esp), %edx
            shldl   %cl, %eax, %edx
            sall    %cl, %eax
            testb   $32, %cl
            je      L2
            movl    %eax, %edx
            xorl    %eax, %eax
    L2:
            rep ret
            .ident  "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
    
    C:\tmp>gcc -O3 -S -m64 a.c
    
    C:\tmp>type a.s
            .file   "a.c"
            .text
            .p2align 4,,15
            .globl  shift
            .def    shift;  .scl    2;      .type   32;     .endef
            .seh_proc       shift
    shift:
            .seh_endprologue
            movq    %rdx, %rax
            salq    %cl, %rax
            ret
            .seh_endproc
            .ident  "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
    Also, I posted your results to LTCB. http://mattmahoney.net/dc/text.html#1773

  32. #358
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Matt Mahoney View Post
    A 64 bit shift should be faster with a 64 bit compile. The function shift() is 10 instructions in 32 bits and 3 instructions in 64 bit. I cheated a bit by making the shift amount the first parameter so it is already in RCX (at least in Windows). Otherwise there is an extra instruction to move it there.

    Code:
    C:\tmp>type a.c
    unsigned long long shift(int x, unsigned long long a) {
      return a << x;
    }
    
    C:\tmp>gcc -O3 -S -m32 a.c
    
    C:\tmp>type a.s
            .file   "a.c"
            .text
            .p2align 4,,15
            .globl  _shift
            .def    _shift; .scl    2;      .type   32;     .endef
    _shift:
            movl    4(%esp), %ecx
            movl    8(%esp), %eax
            movl    12(%esp), %edx
            shldl   %cl, %eax, %edx
            sall    %cl, %eax
            testb   $32, %cl
            je      L2
            movl    %eax, %edx
            xorl    %eax, %eax
    L2:
            rep ret
            .ident  "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
    
    C:\tmp>gcc -O3 -S -m64 a.c
    
    C:\tmp>type a.s
            .file   "a.c"
            .text
            .p2align 4,,15
            .globl  shift
            .def    shift;  .scl    2;      .type   32;     .endef
            .seh_proc       shift
    shift:
            .seh_endprologue
            movq    %rdx, %rax
            salq    %cl, %rax
            ret
            .seh_endproc
            .ident  "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
    Also, I posted your results to LTCB. http://mattmahoney.net/dc/text.html#1773
    Thanks. I would like to use 64 bit code for shifts, but the problem is that 64-bit code is expanding the code in lots of other places. For instance, the following code

    Code:
        temp_input_word = temp_input_word >> 21;
        symbol_bits += symbol_bits_table[temp_input_word];
    produces this when compiled for 32 bits:
    Code:
        shrl    $21, %esi
        movzbl    _symbol_bits_table(%esi), %eax
        addb    %al, _symbol_bits
    and this when compiled for 64 bits:
    Code:
        leaq    symbol_bits_table(%rip), %rax
        shrl    $21, %ebx
        movzbl    (%rax,%rbx), %eax
        addb    %al, symbol_bits(%rip)

  33. #359
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Thanks. I would like to use 64 bit code for shifts, but the problem is that 64-bit code is expanding the code in lots of other places. For instance, the following code

    Code:
        temp_input_word = temp_input_word >> 21;
        symbol_bits += symbol_bits_table[temp_input_word];
    produces this when compiled for 32 bits:
    Code:
        shrl    $21, %esi
        movzbl    _symbol_bits_table(%esi), %eax
        addb    %al, _symbol_bits
    and this when compiled for 64 bits:
    Code:
        leaq    symbol_bits_table(%rip), %rax
        shrl    $21, %ebx
        movzbl    (%rax,%rbx), %eax
        addb    %al, symbol_bits(%rip)
    What's wrong with the 64-bit version? Does that code run slower? The difference is that the 64-bit version moves the table address into a register, and the 32-bit version uses it directly from memory. I'm not sure why, but it might reflect a deliberate change in the instruction set. The new version is more RISC-y.

  34. #360
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by nburns View Post
    What's wrong with the 64-bit version? Does that code run slower? The difference is that the 64-bit version moves the table address into a register, and the 32-bit version uses it directly from memory. I'm not sure why, but it might reflect a deliberate change in the instruction set. The new version is more RISC-y.
    The 64-bit version of TreeDecode takes 7% longer to run. I suspect it is mostly from the extra instructions like the one above. There are a lot of them. I think it would more than wipe out the savings from being able to efficiently perform shifts on 64 bit values. I wonder if I need a different compiler.
    Last edited by Kennon Conrad; 14th October 2014 at 08:36.

Page 12 of 29 FirstFirst ... 2101112131422 ... LastLast

Similar Threads

  1. Replies: 4
    Last Post: 2nd December 2012, 02:55
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 07:37
  3. M03 alpha
    By michael maniscalco in forum Data Compression
    Replies: 6
    Last Post: 10th October 2009, 00:31
  4. PIM 2.00 (alpha) is here!!!
    By encode in forum Forum Archive
    Replies: 46
    Last Post: 14th June 2007, 19:27
  5. PIM 2.00 (alpha) overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 8th June 2007, 13:41

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •