Page 34 of 36 FirstFirst ... 243233343536 LastLast
Results 991 to 1,020 of 1056

Thread: Paq8pxd dict

  1. #991
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    630
    Thanks
    288
    Thanked 252 Times in 128 Posts
    The deflate streams in this file aren't recompressed same as the original by the zlib library and the differences are too big, so the transform fails for paq8px(d). To solve this, paq8px(d) would have to integrate preflate, which hasn't been done yet. Preflate stores zlib reconstruction data to solve this problem in these cases.

    Preflate is integrated in Precomp since v0.4.7, so as a workaround, you can use "precomp -cn" and paq8pxd afterwards. Looking at the results from precomp alone shows that it won't make a big difference on this file, though the difference for paq8pxd is a bit larger:

    Code:
    Original:          			16.709.659 bytes
    Precomp 0.4.7 -cn: 			16.949.572 bytes (18 PDF streams, only about 200 KB difference, ~7 KB zlib reconstruction data)
    Precomp 0.4.7 -t+:  			 2.578.516 bytes (only lzma2, don't transform the deflate streams)
    Precomp 0.4.7:      			 2.542.832 bytes (difference ~36 KB)
    
    paq8pxd_v90_mt -s4:			 1.854.287 bytes
    Precomp 0.4.7 -cn | paq8pxd_v90_mt -s4:  1.794.806 bytes (difference ~59 KB)
    There are 18 PDF streams in the file, but they are relatively small. The remaining parts of the file is uncompressed image data (multiple images, 400 KB - 4 MB in size). Detecting those would benefit this file more than using preflate.

    EDIT: When processing the PDF file using qpdf, we can force the uncompressed image data to be processed using deflate which leads to the attached smaller PDF file (3.8 MB):

    Code:
    qpdf --stream-data=compress 7421.pdf 7421-compressed.pdf
    Note that this is a lossy transform, so "--stream-data=uncompress" won't lead back to the original PDF, but it shows how inefficient the image data was stored in the original PDF.
    Attached Files Attached Files
    Last edited by schnaader; 11th November 2020 at 21:54. Reason: added paq8pxd_v90 results
    http://schnaader.info
    Damn kids. They're all alike.

  2. Thanks:

    brispuss (12th November 2020)

  3. #992
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    630
    Thanks
    288
    Thanked 252 Times in 128 Posts
    As for why paq8px(d) doesn't seem to detect the image data in this file, perhaps some further analysis of the pdf image detection in paq8px or paq8pxd would help. The images in the file all have the same basic format (with varying width, height, length):

    Code:
    <<
    /Width 725
    /ColorSpace/DeviceCMYK
    /Height 190
    /Subtype/Image
    /Type/XObject
    /Length 551000
    /BitsPerComponent 8
    >>
    Except for the "/Type/XObject" and CMYK colorspace I don't see something special at the first glance, but I don't know the detection code that well, so one of the PAQ contributors will spot what leads to the non-detection more quickly.
    http://schnaader.info
    Damn kids. They're all alike.

  4. Thanks:

    brispuss (12th November 2020)

  5. #993
    Member
    Join Date
    Aug 2008
    Location
    NZ
    Posts
    63
    Thanks
    35
    Thanked 11 Times in 7 Posts
    Thanks for your efforts schnaader!!

    I did try compressing with and without using precomp V0.4.8 dev (with -cn switch when precomp used), but in each case paq8pxd (v90) reported "Transform fails . . .".

    Compressed file sizes were fairly similar when comparing precomp processed file to non-precomp processed file, although precomp with paq8pxd produced a slightly smaller file.

    Code:
    paq8pxd v90 -s8                                  1,833,979 bytes
    precomp v0.4.8 dev -cn & paq8pxd v90 -s8         1,773,737 bytes
    So it remains to be seen whether paq8pxd could be "improved" to recognize and (pre)process files without these "Transform fails . ." errors occurring and perhaps in turn improving compression ratio.

  6. #994
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Darek View Post
    I'll test it. My estimate is about 100KB less.

    122'945'119 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: -0,06%
    122'838'xxx - enwik9 -x15 -w -e1,english.dic - estimate for paq8px_v90
    Scores for enwik8 and enwik9 by latest paq8pxd versions:
    15'654'151 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: 0,00%
    122'945'119 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: -0,06% - best score for paq8pxd series

    15'647'580 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v90, change: -0,04% - best score for paq8pxd series but...
    123'196'527 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v90, change: 0,20% - 0.2% worse score than paq8pxd_v89_60_4095

  7. #995
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Paq8pxd90fix2
    - improve jpeg compression
    using -s8 option on
    f.jpg (darek corpus) 112038 bytes -> 81093 bytes
    a10.jpg (maximum compression corpus) 842468 bytes -> 621370 bytes
    dsc_0001.jpg 3162196 bytes -> 2180217 bytes
    Now it beats paq8px197, paq8sk38 and paq8pxd90.
    there is source code and binary file inside the package file
    Attached Files Attached Files

  8. #996
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    108
    Thanks
    139
    Thanked 50 Times in 30 Posts
    hi guys,
    starting from version 90 i get a crash if i use too much memory.
    for example:

    >paq8pxd64 -x10 testset_docx c:\compression\doc_testset\*.docx <--- RUN FINE
    >paq8pxd64 -x11 testset_docx c:\compression\doc_testset\*.docx <--- CRASH

    It is this behaviour normal?

    thanks!
    Luca

  9. #997
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by LucaBiondi View Post
    hi guys,
    starting from version 90 i get a crash if i use too much memory.
    for example:

    >paq8pxd64 -x10 testset_docx c:\compression\doc_testset\*.docx <--- RUN FINE
    >paq8pxd64 -x11 testset_docx c:\compression\doc_testset\*.docx <--- CRASH

    It is this behaviour normal?

    thanks!
    Luca
    Yes, this is the crash which I've mentioned in this post: https://encode.su/threads/1464-Paq8p...ll=1#post67248
    A10.jpg runs only for up to -x10
    ohs.doc (with JPEG inside) runs only for up to -x13

  10. #998
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    108
    Thanks
    139
    Thanked 50 Times in 30 Posts
    Hi Darek!
    Yes true

  11. #999
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Paq8pxd90fix3
    - improve and enhance jpeg compression
    using -s8 option on:
    f.jpg 112038 bytes -> 80984 bytes
    a10.jpg 842468 bytes -> 621099 bytes
    dsc_0001.jpg 3162196 bytes -> 2178889 bytes
    ​there is binary and source code.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	A10.jpg 
Views:	12 
Size:	822.7 KB 
ID:	8128   Click image for larger version. 

Name:	F.jpg 
Views:	13 
Size:	109.4 KB 
ID:	8129  
    Attached Files Attached Files

  12. Thanks (2):

    Darek (3rd December 2020),DZgas (3rd December 2020)

  13. #1000
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    108
    Thanks
    139
    Thanked 50 Times in 30 Posts
    Thanks have you take a look to the bug that crash the app?
    I will not test versions that crash because i usually use the -x12 option.
    Thank you!

  14. #1001
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    Paq8pxd90fix3
    - improve and enhance jpeg compression
    using -s8 option on:
    f.jpg 112038 bytes -> 80984 bytes
    a10.jpg 842468 bytes -> 621099 bytes
    dsc_0001.jpg 3162196 bytes -> 2178889 bytes
    ​there is binary and source code.
    Also don't forget to submit your changes to the paq8pxd repo: https://github.com/kaitz/paq8pxd/pulls

  15. #1002
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Quote Originally Posted by LucaBiondi View Post
    Thanks have you take a look to the bug that crash the app?
    I will not test versions that crash because i usually use the -x12 option.
    Thank you!
    i have look at the source code but i am not sure which part cause that error. The solution for that crash is change line 1268 from 10000 downto 1000 but the compression is worse than before

  16. #1003
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Downgrade memory option in each model can make the process successfull but the compression is still worse

  17. #1004
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    108
    Thanks
    139
    Thanked 50 Times in 30 Posts
    Ah ok but if you get a crash i suppose it is not the right way.
    Thank you!
    Last edited by LucaBiondi; 2nd December 2020 at 19:08.

  18. #1005
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    You are welcome

  19. #1006
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    74
    Thanks
    297
    Thanked 18 Times in 14 Posts
    Hi kaitz, mpais and Gotty
    ____
    There is a problem with paq8pxd_v89_AVX2
    It can't use my arbitrary dictionary!
    I compiled a dictionary of 188240 lines, from the file TS40.txt (GDC Competition)
    I use paq8pxd_v89_AVX2.exe -x0 -e1,newenglish.dic TS40.txt
    But paq8pxd_v89 does not use my dictionary, it creates its own dynamic one.

  20. #1007
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 483 Times in 257 Posts
    Reading the code the dictionary needs unix line endings.

    void wfgets1(char *str, int count, FILE  *fp) {
    int c, i = 0;
    while (i<count-1 && ((c=getc(fp))!=EOF)) {
    str[i++]=c;
    if (c=='\n')
    break;
    }
    str[i]=0;
    }

    Do you use \n or \r\n?

  21. #1008
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Quote Originally Posted by kaitz View Post
    paq8pxd_v84


    Line ending in external dict must be \n (inc. last line)
    KZo


  22. #1009
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    74
    Thanks
    297
    Thanked 18 Times in 14 Posts
    Here is an example of a file.
    Attached Files Attached Files

  23. #1010
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Corrected.

    Tested enwik8, this generates about 80MB file using this dict. (-s0 )
    Attached Files Attached Files
    KZo


  24. Thanks:

    xinix (22nd December 2020)

  25. #1011
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    74
    Thanks
    297
    Thanked 18 Times in 14 Posts
    ​Thank you!Seems to be working now!
    I'll double-check tomorrow.

  26. #1012
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    74
    Thanks
    297
    Thanked 18 Times in 14 Posts
    Thanks, that works better!
    But pxd89 uses my dictionary but still tries to find and insert my words at the end.
    Here's an example:
    zygnematales
    zygomycota
    zygophyllaceae
    zygophyllales
    zygoptera
    zyklon
    zyrian
    zytkow
    zyxw
    zzuuzz
    zzyzx
    zzzzzzzzzzz
    ABot
    AKMap
    ALMap
    AYref
    AcuГ±a
    AerolГ*neas
    AlbГ©niz
    AlmodГіvar
    AmГ©rique
    AnГЎrion
    ArcГЎngel
    ____
    In red is my dictionary and in blue is what pxd found.
    Is it possible to disable this with some parameter?

  27. #1013
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    No.

    Look at post above on v84.
    (When external dict) some parameters are hard coded in wrt,
    utf8 chars are added to dict no matter what. Minfq for dynamic words is *2 and length >3
    Your example has UTF8 chars in words so they are added.
    KZo


  28. #1014
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Just want to confirm that lstm.inc is working.
    Tested with fp8_v5, attached.

    Tested only enwik6.

    So problem is probably in source code of pxd. (its a mess )
    Attached Files Attached Files
    KZo


  29. Thanks (2):

    Mauro Vezzosi (24th December 2020),xinix (24th December 2020)

  30. #1015
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Attached binaries for paq8pxd v91

    Code:
    - fix jpeg compression on larger levels
    - fix SZDD recompression (fail if error)
    Attached Files Attached Files

  31. Thanks:

    Darek (28th December 2020)

  32. #1016
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    paq8pxd_v93
    Code:
    - change jpeg model
    Source on github. You need to compile your own binary.

    paq8pxd_v92
    Code:
    - small fixes
    - small adjustments
    - detect pdf CMYK and force as IMAGE32 (https://encode.su/threads/1464-Paq8p...ll=1#post67401)
    Code:
    paq8pxd_v93  -s8 a10.jpg   842468  621739   27.20 sec, used 1979 MB
    paq8pxd_v91  -s8 a10.jpg   842468  623058   19.36 sec, used 1978 MB
    paq8px_v198  -8  a10.jpg   842468  624593   27.38 sec, used 2547 MB
    
    paq8pxd_v93  -s8 mill.jpg 7132151 4933064  215.93 sec, used 1979 MB 
    paq8pxd_v91  -s8 mill.jpg 7132151 4940881  153.20 sec, used 1978 MB 
    paq8px_v198  -8  mill.jpg 7132151 4952107  238.47 sec, used 2547 MB
    There is speed loss but i think its ok compared to px version. With larger files time diff on pxd vs px is larger and it grows.
    KZo


  33. Thanks (5):

    Darek (28th December 2020),Gotty (28th December 2020),Mike (29th December 2020),moisesmcardona (28th December 2020),xinix (28th December 2020)

  34. #1017
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    v92 and v93 binaries
    Attached Files Attached Files

  35. Thanks:

    Darek (28th December 2020)

  36. #1018
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Code:
    15642246 - enwik8 -x15 -w -e1,english.dic.txt  paq8pxd_v93 14069.03 sec
    KZo


  37. Thanks:

    Darek (28th December 2020)

  38. #1019
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Scores of my testset by paq8pxd_v93 (paq8pxd_v92 got almost the same results except JPG file). 2300 bytes of gain to paq8pxd_v91 (also almost exactly the same as paq8pxd_v90).
    According to builds - for my laptop "nativecpu" versions are about 8-10% faster
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v93.jpg 
Views:	18 
Size:	854.7 KB 
ID:	8220  

  39. Thanks:

    Gotty (29th December 2020)

  40. #1020
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    According to builds - for my laptop "nativecpu" versions are about 8-10% faster
    Native CPU is essentially using AVX2 instructions whereas the regular one uses SSE2. You can confirm this by opening both executables. The instruction set gets printed on screen.

  41. Thanks:

    Darek (29th December 2020)

Page 34 of 36 FirstFirst ... 243233343536 LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 03:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 22:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •