Page 2 of 5 FirstFirst 1234 ... LastLast
Results 31 to 60 of 121

Thread: NNCP: Lossless Data Compression with Neural Networks

  1. #31
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    I've tested enwik6 compression on NNCP using batch_size 2 option.
    May I do something wrong, but enwik6 file after NNCP preprocesser is quite big - 923'782 bytes and compression gives worse score than non preprpocessed file... maybe for bigger files like enwik8 there would be a better gain.
    However using batch_size low values gives good gain. Hidden size setting to 512 also gives some gain but it's only 0.15%,

    Here are scores for enwik6:

    248'571 - no preprocessing + batch_size 2 - comparing to Mauro's default score shows 4.7% of gain.
    208'303 - preprocessed by DRT + batch_size 2
    206'402 - preprocessed by CMIX v17 + batch_size 2
    206'086 - preprocessed by CMIX v17 + batch_size 2 + hidden_size 512
    259'292 - preprocessed by NNCP + batch_size 2 + hidden_size 512 - ??? - worse score than non-preprocessed...
    205'047 - preprocessed by CMIX v17 + batch_size 1 + hidden_size 512
    176'411 - cmix v17

  2. #32
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    You have to specify different nncp options for preprocessed and normal files.
    To be specific, -n_symb 256 for normal files (including DRT files), and -n_symb N for preprocessed,
    where N is the alphabet size printed by "preprocess":
    Code:
    Z:\004>.\preprocess c _tmp_word.txt enwik6 _tmp_out.bin 16384 64
    input: 1000000 bytes
    after case/space preprocessing: 1107308 symbols
       822    461688    445950
    Number of words=822 Final length=461688
    See "number of words" here?
    As to the size, its ok... its basically converted to unicode.

    You can also try changing the last parameter of nncp preprocess... to something like 1 maybe.
    Its a frequency threshold for adding words to the dictionary.
    Setting it to 1 should always pad the dictionary to specified size.

  3. Thanks:

    Darek (22nd April 2019)

  4. #33
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    Here are scores for enwik6 with proper options:

    248'571 - no preprocessing + batch_size 2 - comparing to Mauro's default score shows 4.7% of gain.
    208'303 - preprocessed by DRT + batch_size 2
    206'402 - preprocessed by CMIX v17 + batch_size 2
    206'086 - preprocessed by CMIX v17 + batch_size 2 + hidden_size 512
    205'047 - preprocessed by CMIX v17 + batch_size 1 + hidden_size 512

    259'292 - preprocessed by NNCP + batch_size 2 + hidden_size 512

    241'184 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 822 -> frequency threshold = 64, packed dictionary size = 1'842
    239'441 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 954 -> frequency threshold = 48, packed dictionary size = 2'234
    239'585 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 1'257 -> frequency threshold = 32, packed dictionary size = 3'346
    240'216 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 1'563 -> frequency threshold = 24, packed dictionary size = 4'465
    240'935 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 2'210 -> frequency threshold = 16, packed dictionary size = 6'695
    242'073 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 3'427 -> frequency threshold = 9*, packed dictionary size = 10'943
    245'220 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 6'959 -> frequency threshold = 4, packed dictionary size = 24'283
    239'377 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 15'115 -> frequency threshold = 2, packed dictionary size = 61'239
    238'429 - preprocessed by NNCP + batch_size 2 + hidden_size 512 + n_symb 16'384 -> frequency threshold = 1, packed dictionary size = 64'995

    176'411 - cmix v17

    * - from different reasons option 8 provides preprocessor to hang.

    Looks that for enwik6 NNCP preprocessing isn't as effective as for enwik8 or enwik9 or the used RNN algorithm have better efficiency for bigger files because scores of NNCP and CMIX v17 are much closer for enwik8 (14% of diff.) and enwik9 (10% of diff.) compared to my scores of preprocessed enwik6 (35-39% of difference).

  5. Thanks:

    Shelwien (23rd April 2019)

  6. #34
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    Quote Originally Posted by Shelwien View Post
    pdf says "For the small LSTM model, we reused the text preprocessor of CMIX".

    I used
    .\preprocess c _tmp_word.txt enwik8 out.bin 16384 64
    .\nncp -T 4 -n_layer 5 -hidden_size 352 -n_symb 16388 -full_connect 1 -lr 6e-3 c out.bin out-nncp.bin
    like specified in readme file
    and got 16,984,458, verified decoding too.

    .pdf says result should be "LSTM (large) 16,924,569".
    @Shelwien - how is the size of your preprocessed enwik8 file (out.bin)?
    I'll try to test to compress it but I need to be sure if my preprocessed file is proper (means the same like yours).
    My preprocessed file have 43'650'816 bytes.

  7. #35
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    Yes, same size.
    md5sum: 6f19d82a07e21bd78f09e951bf27259b out.bin

  8. Thanks:

    Darek (23rd April 2019)

  9. #36
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    Here are some tests of "batch size" different options used on my testset. Reducing value of this option improves compression ratio but hurts compression speed.
    For total testset score batch size option 1 gives 2.5% of gain, however some small files got ~20%, ~30% or even 44% of improvements. Also my small textual files got about 10% of gain.
    For all files from my testset best scores are for batch size options between 1 and 3. I'm curious if it works with similar performance for bigger files like enwik8 or enwik9.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	nncp_batch_size.jpg 
Views:	73 
Size:	637.6 KB 
ID:	6589  
    Last edited by Darek; 24th April 2019 at 03:25.

  10. #37
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    Similar excersise like for my testset but for MaximumCompression testset.
    batch_size option 1 and 2 gives the best results.

    I didn't yet tested other options, however now it looks that there are fine potential in NNCP.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	nncp_max_comp_bs.jpg 
Views:	58 
Size:	118.0 KB 
ID:	6590  

  11. #38
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    @Darek: NNCP preprocessing:

    Your results are not meaningful if you don't report the sum of the sizes of the compressed data and of the compressed dictionary. In the case of enwik6, the CMIX preprocessing is not competitive because the size of the compressed dictionary alone is 160 KB ! And with the NNCP preprocessing, only small dictionaries bring a (tiny) gain.

    Since the CMIX dictionary is large, it is worth using it only on large files such as enwik8 if you want to compare it with another preprocessor.

  12. Thanks:

    Darek (25th April 2019)

  13. #39
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    @Darek: batch size

    Smaller batch sizes are generally better because there are more learning steps so the model adapts faster (even if there is more noise on the gradients). However, the gain is significant only for the start of the file. In my tests, the gain of using a small batch size is constant, so it is large in percentage for small files but tends to zero for large files. Hence for enwik8 I did not see an interesting gain. A compromise would be to use a small batch size for the start of the file (say the first 1 MB) and then switch to a large batch size to increase the speed. Since NNCP was optimized for large files (at least enwik8 ), it was not worth the effort.
    Last edited by fab; 24th April 2019 at 21:19. Reason: typo

  14. Thanks:

    Darek (25th April 2019)

  15. #40
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    Quote Originally Posted by fab View Post
    @Darek: NNCP preprocessing:

    Your results are not meaningful if you don't report the sum of the sizes of the compressed data and of the compressed dictionary. In the case of enwik6, the CMIX preprocessing is not competitive because the size of the compressed dictionary alone is 160 KB ! And with the NNCP preprocessing, only small dictionaries bring a (tiny) gain.

    Since the CMIX dictionary is large, it is worth using it only on large files such as enwik8 if you want to compare it with another preprocessor.
    Thanks. According to combining compressed files with compressed dictionary -> I've posted all these data for NNCP compressor in the second, little enchanced comparison for enwik6 - https://encode.su/threads/3094-NNCP-Lossless-Data-Compression-with-Neural-Networks?p=59996&viewfull=1#post59996.. But yes, there won't be dictionary sizes for CMIX and DRT preprocessing and sum of data and dictionary sizes, which could be misleading.

    All sums of these tests shows that the best result = 241'675 was for NNCP frequency threshold = 48 -> sum of data = 239'441 and dictionary = 2'234.

    addendum 1 - there no score of cmix w/o dictionary preprocessing for enwik6 - for this score there won't be dictionary needed
    addendum 2 - to be honest and to ultimate fair comparison we should also add (to data and dictionary packed sizes) de/compressor and de/preprocessor sizes (executable or source)

    Other my tests was not use preprocessor.

    And your idea of variable batch size sounds interesting. Really.

    However maybe it could be also depend to data structure inside the file (redundancy rate? I don't know) - look at FP.LOG score - this file have 20MB of size (Ok, it's not so big as enwik8 or enwik9 but isn't also so small) and the improvement of change batch size option from 16 to 1 is about 11.6% for this file. Maybe the reason is fact that this file is highly compressible and, as you wrote, main of the gain is made on the start of the file and it works across whole file lenght.

  16. #41
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    244
    Thanks
    112
    Thanked 114 Times in 69 Posts
    Hi Fabrice, great work on NNCP! Your results have already lead to some improvements for cmix (changes to the Adam optimizer) and there are several other ideas used in NNCP that I plan to try out when I have time.

    I have a question about embeddings in NNCP. I haven't read through the NNCP source code, but I saw embeddings mentioned in the paper. I haven't done any investigation yet in adding an extra input or output embedding layer in cmix. Currently the input byte (256 one hot encoding) directly gets used by the LSTM gates. An alternative approach would be to add an extra input embedding layer, and use that embedding layer as input to the LSTM. I have also heard of adding an output embedding layer before the softmax, and optionally tying the weights of the two embedding layers. Have you investigated these approaches with NNCP? Does NNCP use an input embedding layer, or were you referring to something else in the table with the number of LSTM parameters (the "no embed" column)?

  17. #42
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    Ok, so I replaced the entropy coder with my rc.
    It turned out hard to do, since nncp uses length-prefixed blocks and 16 coder instances normally.
    So in the end I forced it to write entropy code to another file, otherwise decoding didn't work.
    And, uh, the result: http://nishi.dreamhosters.com/u/nncp_rc_v0.rar
    Code:
    nncp -T 1 -n_layer 5 -hidden_size 352 -n_symb 256 -full_connect 1 -lr 6e-3  c book1 1
    
    768,771 BOOK1
    
    220,138 out.nncp // nncp-2019-04-13-win64.zip exe
    
         47 out.nncp
    217,916 rc_bin_0 // nncp_rc_v0
    Since it uses 8-bit probabilities, I'd also expect a huge improvement from SSE,
    which is actually why I started this.
    But no SSE here yet.

  18. Thanks:

    xinix (25th April 2019)

  19. #43
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    244
    Thanks
    112
    Thanked 114 Times in 69 Posts
    Quote Originally Posted by Shelwien View Post
    Ok, so I replaced the entropy coder with my rc.
    It turned out hard to do, since nncp uses length-prefixed blocks and 16 coder instances normally.
    So in the end I forced it to write entropy code to another file, otherwise decoding didn't work.
    And, uh, the result: http://nishi.dreamhosters.com/u/nncp_rc_v0.rar
    Code:
    nncp -T 1 -n_layer 5 -hidden_size 352 -n_symb 256 -full_connect 1 -lr 6e-3  c book1 1
    
    768,771 BOOK1
    
    220,138 out.nncp // nncp-2019-04-13-win64.zip exe
    
         47 out.nncp
    217,916 rc_bin_0 // nncp_rc_v0
    Since it uses 8-bit probabilities, I'd also expect a huge improvement from SSE,
    which is actually why I started this.
    But no SSE here yet.
    Why is there such a big gain when switching to range encoding? Is it due to not using enough bits to represent the arithmetic coder probabilities in NNCP?

  20. #44
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    It may be mostly fixed overhead from streams and blocks.
    But yes, arith.c coder also has some redundancy. I think its an old-style 16-bit bitwise arithmetic.
    I started enwik8 compression, but it'd take a while with one thread... would see how it scales then.

    Update: So it wasn't rc precision or headers. I noticed processing being too slow on enwik8 (100 bytes/s or so),
    and retested book1 with n_streams=16. Looks like it was the "batch_size" actually:
    Code:
        107 out.nncp
     13,957 rc_bin_0
     13,670 rc_bin_1
     14,077 rc_bin_2
     13,910 rc_bin_3
     13,748 rc_bin_4
     13,947 rc_bin_5
     13,867 rc_bin_6
     13,733 rc_bin_7
     13,798 rc_bin_8
     13,656 rc_bin_9
     13,584 rc_bin_10
     13,722 rc_bin_11
     13,598 rc_bin_12
     13,689 rc_bin_13
     13,540 rc_bin_14
     13,464 rc_bin_15
    =220,067 <total>
    Also, apparently nncp still resets the coder once per ~1MB of input even with batch_size 1.

    Update2: compressed enwik8.
    Code:
    16,984,458 default
    
           991 out.nncp // compressed to 56 bytes with paq8px
    16,981,769 rc_bin_0..223

  21. #45
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    Quote Originally Posted by byronknoll View Post
    Hi Fabrice, great work on NNCP! Your results have already lead to some improvements for cmix (changes to the Adam optimizer) and there are several other ideas used in NNCP that I plan to try out when I have time.

    I have a question about embeddings in NNCP. I haven't read through the NNCP source code, but I saw embeddings mentioned in the paper. I haven't done any investigation yet in adding an extra input or output embedding layer in cmix. Currently the input byte (256 one hot encoding) directly gets used by the LSTM gates. An alternative approach would be to add an extra input embedding layer, and use that embedding layer as input to the LSTM. I have also heard of adding an output embedding layer before the softmax, and optionally tying the weights of the two embedding layers. Have you investigated these approaches with NNCP? Does NNCP use an input embedding layer, or were you referring to something else in the table with the number of LSTM parameters (the "no embed" column)?
    Hi Byron. Nice to see that some ideas from NNCP can help in CMIX.

    NNCP uses the input symbols the same way as lstm-compress i.e. they are directly used by the LSTM gates (it corresponds to having different embeddings for each LSTM gate). The corresponding parameters are not counted in the "no embed" column of the paper because they marginally increase the number of operations.

    I tried several other architectures but none were better. In particular, I tried to have a single embedding layer and to input it to all the LSTM layers. When I did it, I also tried "tied embeddings" without success. Perhaps I made a mistake somewhere because tied embeddings are said to bring a significant gain with a large vocabulary. Another possibility is that the convergence is slower.

    Among the differences with lstm-compress, I found that layer normalization brought some gain (1%) for a small added complexity. You can disable it in NNCP with the "-layer_norm 0" option and compare.

  22. Thanks:

    byronknoll (27th April 2019)

  23. #46
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    Merged rc streams into main file.
    (There's still header redundancy, but its hard to fix without removing -T and -batch_size support, and then processing speed becomes 100bytes/s).

    Added SSE (patched mod_SSE) to NNCP.
    (Again, its hard to do it properly because of interleaved processing - I can only use context available to individual entropy coder instances,
    but it still should improve compression by 40-50kb for enwik8.)

    Btw, I needed to replace original entropy coder because its designed for 8-bit probabilities, while my SSE works with 15-bit precision.

    http://nishi.dreamhosters.com/u/nncp_rc_v1.rar
    Code:
    > nncp -T 3 c book1 book1.nncp
    219,955 // nncp-2019-04-13 original version
    219,880 // nncp_rc_v0 with merged rc streams
    218,959 // nncp_rc_v1 with SSE
    
    > nncp -T 4 -n_layer 5 -hidden_size 352 -n_symb 16388 -full_connect 1 -lr 6e-3 c _tmp_out.bin _tmp_out-nncp.bin
    16,984,458 default
    16,982,760 nncp_rc_v0
    16,951,916 nncp_rc_v1
    enwik8 decoder output became incorrect after 3M, but its likely related to stream padding with 0s rather than -1...
    I'll try to fix it later, but compressed size shouldn't be affected anyway.

  24. Thanks (2):

    Darek (28th April 2019),xinix (28th April 2019)

  25. #47
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    270
    Thanks
    112
    Thanked 153 Times in 112 Posts
    NNCP 2019/05/08 https://bellard.org/nncp/
    Compression is unchanged, from the Changelog file:
    2019-05-08:
    - LibNC: faster matmul
    - nncp: added -n_embed_out parameter
    - preprocess: suppressed "/tmp/word1.txt" file

  26. Thanks (2):

    byronknoll (10th May 2019),Shelwien (10th May 2019)

  27. #48
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    244
    Thanks
    112
    Thanked 114 Times in 69 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    - LibNC: faster matmul
    @Fabrice, any advice on how to get faster dot products? Besides AVX2 + FMA, I guess you can do multi-threading and cache optimization - is that where you are getting improvements? Also, it would be nice if NNCP could be added to the LTCB - are you (or anyone else) planning to create a submission?

  28. #49
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    Probably manual vector optimizations.

    Also, its not really that fast - when it works sequentially, the processing speed is <100 bytes/s on my PC.
    The trick is that it uses 16-bit alphabet, and each bit is encoded separately (unless -batch_size 1 is specified),
    with its own entropy coder instance and output stream.
    I think that avoids serial dependency and allows for MT. But even without MT its still much faster than pure serial processing,
    even though compression is worse.

  29. #50
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    270
    Thanks
    112
    Thanked 153 Times in 112 Posts
    From pag. 11 of nncp.pdf 2019/05/08:
    History:
    - March 30, 2019: initial version.
    - May 4, 2019: added large2 LSTM model. Corrected large1 LSTM model results.

    On tied LSTM:
    In "2.2 Model specificities" of nncp.pdf it is written:
    We modified the cell equation (5) to use the cell output variant from [2]. This change is important because it ensures the cell state is bounded unlike in the conventional LSTM. Moreover, we removed the unnecessary tanh function in equation (6) because the cell state is already clamped between 1 and 1.:
    1) Does anyone have more info on the tied LSTM?
    The cell equation (5) is: c(t,l) = f(t,l) @ c(t-1,l) + min(1 - f(t,l), i(t,l)) @ j(t,l) (@ is the element-wise multiplication)
    [2] is "Gabor Melis, Chris Dyer, Phil Blunsom, On the State of the Art of Evaluation in Neural Language Models, arXiv preprint, arXiv:1707.05589, 2017.", where it is written "Where the equations (of tied LSTM, among others) are based on the formulation of Sak et al. (2014)", but in "Hasim Sak, Andrew W. Senior, and Franc¬łoise Beaufays. Long short-term memory based recur- rent neural network architectures for large vocabulary speech recognition. CoRR, abs/1402.1128, 2014. URL http://arxiv.org/abs/1402.1128." there no mention on the tied LSTM.
    2) Tanh may be unnecessary, however it applies a non-linear transformation: is this non-linear transformation also unnecessary?
    3) The LSTM-T (tied) variant of NNCP ties the input gate with the forget gate (min(1 - f, i)): was it also tested the forget gate tied with the input gate (min(f, 1 - i))?
    I did few tests with my program on i tied f and f tied i, using min (with and without tanh), average and max functions, I hadn't find a clear winner.

    Quote Originally Posted by fab View Post
    I found that layer normalization brought some gain
    I'm testing it in my program and, as far as I've seen so far, layer normalization is a must have.
    Last edited by Mauro Vezzosi; 11th May 2019 at 19:01.

  30. Thanks:

    Shelwien (11th May 2019)

  31. #51
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    270
    Thanks
    112
    Thanked 153 Times in 112 Posts
    Quote Originally Posted by byronknoll View Post
    it would be nice if NNCP could be added to the LTCB - are you (or anyone else) planning to create a submission?
    Code:
                    Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp    Mem  Alg  Note
    -------           -------                     ----------  -----------  -----------  -----------  ----- ------  ----- ---- ----
    cmix v17                                      14,877,373  116,394,271    208,263 s  116,602,534 641189 645651  25258 CM   83
    phda9 1.7                                     15,023,870  116,940,874     43,274 xd 116,984,148  83712  87596   4996 CM   83
    nncp 2019-05-08                               16,791,077  125,623,896    161,133 xd 125,785,029 420168 602409   2040 LSTM 84
    paq8pxd_v47       -s15                        16,080,717  127,404,715    139,841 s  127,544,556  75022  75611  27500 CM   81
    durilca'kingsize  -m13000 -o40 -t2            16,209,167  127,377,411    407,477 xd 127,784,888   1398   1797  13000 PPM  31
    
    ./preprocess c out.words enwik9 out.pre 16384 512
    ./nncp -n_layer 7 -hidden_size 384 -n_embed_out 5 -n_symb 16388 -full_connect 1 -lr 6e-3 c out.pre out.bin
    By default (and it seems the best variant), NNCP uses LSTM-C, that is LSTM clamped: does anyone know how and what it clamps?

  32. Thanks:

    Darek (13th May 2019)

  33. #52
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    One remark - paq8pxd_v47 score are score of enwik9_1423. This split giveas about 100-150KB of gain for enwik9.

    There are two better scores of paq8pxd. The first is official Kaitz version (Matt didn't post yet my request) and it should be:
    paq8pxd_v61 -s15 15,968,477 126,587,796 194,704 s 126,782,500 96077 98751 27500 CM 81

    The second score is from bwt version of (Matt also didn't post yet my request) and it should be as follows:

    paq8pxd_v48_bwt1 -s14 16,004,759 126,183,029 153,295 s 126,336,324 599875 241815 51865 CM 81

  34. #53
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    Quote Originally Posted by byronknoll View Post
    @Fabrice, any advice on how to get faster dot products? Besides AVX2 + FMA, I guess you can do multi-threading and cache optimization - is that where you are getting improvements? Also, it would be nice if NNCP could be added to the LTCB - are you (or anyone else) planning to create a submission?
    Most of the improvements come from not using dot products but matrix products (with large enough matrixes). The speed of dot products is limited by the memory load speed, while in matrix products there are a lot more multiply/add than memory accesses so it is possible to get close to the theoretical FPU speed (32 flops/cycle on a >= Haswell CPU core).

    With a LSTM, the minimum dimension of the matrixes involved in the products is given by the batch size (=the number of separate symbol streams). Hence for maximum speed it is necessary to have a large enough batch size. You can easily benchmark it with NNCP by using the "-batch_size n" option by comparing the speed of n=1 (close to what lstm-compress does) and n=16. It is even more noticeable at decompression time.

  35. #54
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    I've tested my testset with nncp to try to optimize options. It takes some time then I've tested batch size option, then main proposed options (full connect, layer normalisation, hidden size 512, number layers 5 and their combinations) then numner of layers. Scores in attached table. Test directions are from left to right part of table. Dark blue scores are the best scores for NNCP overall, light blue there are best scores for tested option.

    I've got 3.8% of gain (570KB) which is quite good. I've left to test hidden size (in progress) and learning rate options.

    Then I've tested RC_1 version with optimized settings and it adds another 0.6% of gain (100KB) to default original NNCP version.

    There are also a table with comparison with LSTM and default settings. NNCP beat LSTM for almost all files with good margin except 24bit uncompressed images when NNPC lost about 30% to LSTM...

    For two files especially NNCP got impressive results:
    D.TGA - third place overall, just behind cmix and paq8px (which have good parser for this file) and
    E.TIF - fourth place overall, just behind cmix, CMVe and paq8px (which also have good parser for this file)

    For C.TIF, G.EXE, H.EXE and Q.WK3 it got place in the first 10 - which are an also impressive results for compressor without any specialised models!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	nncp_optimization.jpg 
Views:	52 
Size:	1.24 MB 
ID:	6623   Click image for larger version. 

Name:	nncp_comparison.jpg 
Views:	59 
Size:	190.6 KB 
ID:	6624  

  36. Thanks:

    Mauro Vezzosi (26th May 2019)

  37. #55
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    270
    Thanks
    112
    Thanked 153 Times in 112 Posts
    Quote Originally Posted by Darek View Post
    I've tested RC_1 version
    Where did you find it?
    Quote Originally Posted by Darek View Post
    NNCP beat LSTM
    IMO, you are comparing an advanced LSTM implementation (NNCP) with a simpler version (lstm-compress).
    Code:
    Results for NNCP 2019-05-08 with about the same hyperparameters (= options) as lstm-compress.
    (1): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002                                              n_params=508k n_params_nie=232k mem=3.63MB
    (2): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=1  time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002                                              n_params=508k n_params_nie=232k mem=3.63MB
    (3): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=adam lr=4.000e-003 beta1=0.000000 beta2=0.999900 eps=1.000e-005 n_params=508k n_params_nie=232k mem=5.66MB
    (4): lstm-compress v3 2019/03/30, original version, without preprocessor.
         gradient_clipping=-/+2.0 n_layer=3 hidden_size=90  batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002.
    (5): lstm-compress v3 2019/03/30 with cmix v17 2019/05/26 lstm*.* source files (they added adam optimizer), without preprocessor and vocabulary size fixed to 256 (it always output 256 predictions even if the file has fewer symbols, which disables an lstm-compress optimization trick).
         gradient_clipping=-/+2.0 n_layer=3 hidden_size=352 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=adam lr=5.000e-002 adam_lr=3.350e-003 adam_beta1=0.025000 adam_beta2=0.999900 adam_eps=1.000e-006 mem=62.412K.
           (1)        (2)        (3)        (4) |        (5)
     1.434.237  1.423.772  1.400.862  1.411.882 |  1.391.299 0.WAV
       366.975    357.753    348.118    356.764 |    334.957 1.BMP
     1.152.748  1.098.114  1.096.749  1.071.157 |  1.007.442 A.TIF
     1.104.139  1.050.912  1.039.702  1.038.606 |    957.338 B.TGA
       360.723    343.005    346.446    343.564 |    329.041 C.TIF
       333.875    312.336    317.526    313.724 |    293.941 D.TGA
       502.935    502.328    499.786    501.892 |    498.071 E.TIF
       111.953    111.858    111.539    111.831 |    111.706 F.JPG
     1.439.333  1.428.632  1.414.462  1.429.351 |  1.398.618 G.EXE
       691.345    643.372    653.221    647.638 |    576.596 H.EXE
       318.786    298.189    299.873    302.428 |    273.341 I.EXE
        44.865     44.415     44.707     44.364 |     44.187 J.EXE
     4.381.606  4.230.458  4.220.113  4.298.921 |  3.954.964 K.WAD
     3.477.008  3.396.006  3.372.390  3.358.974 |  3.216.273 L.PAK
       126.592    122.518    115.555    120.772 |    105.973 M.DBF
       156.401    145.991    139.289    157.563 |    133.532 N.ADX
        10.824      9.126      8.945     10.190 |      8.569 O.APR
         9.205      8.338      8.274      8.404 |      6.373 P.FM3
       405.454    351.831    367.267    359.346 |    287.976 Q.WK3
        46.719     43.080     42.745     46.377 |     40.933 R.DOC
        44.977     40.527     40.053     43.402 |     36.404 S.DOC
        31.331     28.129     28.316     30.812 |     26.422 T.DOC
        14.504     13.099     13.335     14.637 |     12.971 U.DOC
        31.727     28.240     28.482     30.584 |     26.166 V.DOC
        21.392     19.976     19.526     21.378 |     19.128 W.DOC
        17.056     15.970     15.730     17.016 |     15.266 X.DOC
           725        582        801        573 |        576 Y.CFG
           358        290        382        235 |        237 Z.MSG
    16.637.793 16.068.847 15.994.194 16.092.385 | 15.108.300 Total
    Last edited by Mauro Vezzosi; 29th May 2019 at 22:35.

  38. Thanks:

    Darek (28th May 2019)

  39. #56
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Where did you find it?

    IMO, you are comparing an advanced LSTM implementation (NNCP) with a simpler version (lstm-compress).
    EDIT: Ops, sorry, I forgot that the results of lstm-compress can have the preprocessor! I'm retesting lstm-compress without preprocessor to compare with NNCP.
    Code:
    Results for NNCP 2019-05-08 with about the same hyperparameters (= options) as lstm-compress.
    (1): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002                                              n_params=508k n_params_nie=232k mem=3.63MB
    (2): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=adam lr=4.000e-003 beta1=0.000000 beta2=0.999900 eps=1.000e-005 n_params=508k n_params_nie=232k mem=5.66MB
    (3): lstm-compress v3 2019/03/30, original version, without preprocessor.
         gradient_clipping=-/+2.0 n_layer=3 hidden_size=90  batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002.
    (4): lstm-compress v3 2019/03/30 with cmix v17 2019/05/26 lstm*.* source files (they added adam optimizer), without preprocessor and vocabulary size fixed to 256 (it always output 256 predictions even if the file has fewer symbols, which disables an lstm-compress optimization trick).
         gradient_clipping=-/+2.0 n_layer=3 hidden_size=352 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=adam lr=5.000e-002 adam_lr=3.350e-003 adam_beta1=0.025000 adam_beta2=0.999900 adam_eps=1.000e-006 mem=62.412K.
    ...
    If you want to match the lstm-compress parameters more closely with NNCP, you should set batch_size = 1 and keep time_steps = 10 (time_steps corresponds to the "horizon" parameter of lstm-compress).

  40. Thanks:

    Mauro Vezzosi (27th May 2019)

  41. #57
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    270
    Thanks
    112
    Thanked 153 Times in 112 Posts
    I thought that batch_size and time_steps was coupled in lstm-compress ("yes, bptt and mbs are coupled"), i.e. horizon = 10 means batch_size = 10 and time_steps = 10, and it updates the NN every "horizon" number of input symbols.
    I added some printf() to lstm-compress and the backward pass is executed after "horizon" number of input symbols, maybe I still don't fully understand how it works.

  42. #58
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    679
    Thanked 431 Times in 329 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Where did you find it?
    This is a Shelwien version here (nncp_rc_v1 - maybe my descibe was misleading): https://encode.su/threads/3094-NNCP-...ll=1#post60029

    Quote Originally Posted by Mauro Vezzosi View Post
    IMO, you are comparing an advanced LSTM implementation (NNCP) with a simpler version (lstm-compress).
    Yes, I know it.

    I was wandering that image files A.TIF and B.TGA was compressed better by lstm-compress but's Ok, it's preprocessor.
    Is it lsym-compress from (5) your modified version? Gains look very good.

    According to NNCP settings, my best settings are different for particular files.
    Best overall settings are batch_size = 1 or 2, full_connect = 1, hidden_size = 512, n_layers = 4.

  43. #59
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    Quote Originally Posted by Darek View Post
    This is a Shelwien version here (nncp_rc_v1 - maybe my descibe was misleading): https://encode.su/threads/3094-NNCP-...ll=1#post60029


    Yes, I know it.

    I was wandering that image files A.TIF and B.TGA was compressed better by lstm-compress but's Ok, it's preprocessor.
    Is it lsym-compress from (5) your modified version? Gains look very good.

    According to NNCP settings, my best settings are different for particular files.
    Best overall settings are batch_size = 1 or 2, full_connect = 1, hidden_size = 512, n_layers = 4.
    Did you tune the learning rate (-lr option) ? When you change the model parameters (such as the batch_size), it is important to retune the learning rate.

    For example, with hidden_size = 96, n_layer=3, fc=1, batch_size=16, the best learning rate is 0.01. When reducing batch_size to 1, a better learning rate is 0.0032.

  44. Thanks:

    Darek (29th May 2019)

  45. #60
    Member
    Join Date
    Apr 2019
    Location
    France
    Posts
    17
    Thanks
    0
    Thanked 24 Times in 14 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    I thought that batch_size and time_steps was coupled in lstm-compress ("yes, bptt and mbs are coupled"), i.e. horizon = 10 means batch_size = 10 and time_steps = 10, and it updates the NN every "horizon" number of input symbols.
    I added some printf() to lstm-compress and the backward pass is executed after "horizon" number of input symbols, maybe I still don't fully understand how it works.
    "batch_size" and "time_steps" are unrelated. batch_size = N indicates that the input file is cut into N separate files and that each file is compressed independently with the same neural network (to be more precise, in NNCP the split is done on blocks of N * 100k input symbols by default). lstm-compress does not support this concept so its corresponds to batch_size = 1.

    "time_steps" is the number of symbols between each backward pass as you said. With "batch_size" separate input streams, the backward pass is done every "batch_size * time_steps" symbols.

  46. Thanks (2):

    byronknoll (29th May 2019),Mauro Vezzosi (29th May 2019)

Page 2 of 5 FirstFirst 1234 ... LastLast

Similar Threads

  1. Compression with recurrent neural networks
    By Matt Mahoney in forum Data Compression
    Replies: 6
    Last Post: 28th January 2019, 22:44
  2. The future of lossless data compression
    By inikep in forum Data Compression
    Replies: 66
    Last Post: 5th March 2018, 12:03
  3. Draco 3D data compression (lossless)
    By pothos2 in forum Data Compression
    Replies: 2
    Last Post: 27th January 2017, 00:27
  4. Replies: 5
    Last Post: 17th September 2015, 20:43
  5. lossless data compression
    By SLS in forum Data Compression
    Replies: 21
    Last Post: 15th March 2011, 11:35

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •