Page 15 of 22 FirstFirst ... 51314151617 ... LastLast
Results 421 to 450 of 644

Thread: Paq8pxd dict

  1. #421
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    393
    Thanks
    148
    Thanked 222 Times in 121 Posts
    Stephan, thanks for reporting.
    I have time on next weekend. I hope.
    KZo


  2. The Following User Says Thank You to kaitz For This Useful Post:

    Stephan Busch (26th February 2018)

  3. #422
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    393
    Thanks
    148
    Thanked 222 Times in 121 Posts
    Quote Originally Posted by Stephan Busch View Post
    v46 detects 115MB of dBase data inside .tar of camera raw corpus - at position 424075042 - 539457826;
    if the files are compressed separately, v46 detects 'imgunk' in all .arw and in all .orf.
    Cant reproduce. I am using 7z to create tar archive.
    Quote Originally Posted by Stephan Busch View Post
    The file nikon_1_v2_17.nef made v46 crash (using -s0).
    No crash, still looking.
    Quote Originally Posted by Stephan Busch View Post
    The 'bintext' detected inside leica_m82_05.dng is nearly as large as the whole file,
    the 'bintext' block found in fujifilm_xf1_08.raf is almost half the size of the file.
    has no format parser in pxd, so it detects whatever it thinks.
    KZo


  4. #423
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @Kaitz - will you publish v47 or it's only internal?

  5. #424
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    393
    Thanks
    148
    Thanked 222 Times in 121 Posts
    Quote Originally Posted by Darek View Post
    @Kaitz - will you publish v47 or it's only internal?
    I need to test this new textmodel, and some other things. Takes forever to test
    Will try.
    KZo


  6. The Following User Says Thank You to kaitz For This Useful Post:

    Darek (2nd March 2018)

  7. #425
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    874
    Thanks
    464
    Thanked 175 Times in 85 Posts
    Quote Originally Posted by kaitz View Post
    Cant reproduce. I am using 7z to create tar archive.

    No crash, still looking.

    has no format parser in pxd, so it detects whatever it thinks.

    This is my .tar of Camera Raw testset:
    https://drive.google.com/open?id=11u...jF5EywuZjTNw53

    D:\TESTSETS>paq8pxd_v46_speed -s0 c.tar
    Creating archive c.tar.paq8pxd46 with 1 file(s)...

    File list (17 bytes)
    Compressed from 17 to 22 bytes.

    1/1 Filename: c.tar (577854464 bytes)
    Block segmentation:
    0 | default | 80496 [0 - 80495]
    1 | jpeg | 12388 [80496 - 92883]
    2 | jpeg | 2077017 [92884 - 2169900]
    3 | default | 24855167 [2169901 - 27025067]
    4 | jpeg | 18874 [27025068 - 27043941]
    5 | default | 2 [27043942 - 27043943]
    6 | jpeg | 3553878 [27043944 - 30597821]
    7 | default | 22929426 [30597822 - 53527247]
    8 | jpeg | 16966 [53527248 - 53544213]
    9 | default | 2 [53544214 - 53544215]
    10 | jpeg | 2795912 [53544216 - 56340127]
    11 | default | 22047724 [56340128 - 78387851]
    12 | jpeg | 8406 [78387852 - 78396257]
    13 | default | 19891506 [78396258 - 98287763]
    14 | jpeg | 779708 [98287764 - 99067471]
    15 | bintext | 18989124 [99067472 - 118056595]
    16 | jpeg | 627967 [118056596 - 118684562]
    17 | default | 13245477 [118684563 - 131930039]
    18 | jpeg | 2795912 [131930040 - 134725951]
    19 | default | 9477824 [134725952 - 144203775]
    20 | hdr | 3936 [144203776 - 144207711]
    21 | 24b-image | 230400 [144207712 - 144438111] (width: 960)
    22 | bintext | 10359712 [144438112 - 154797823]
    23 | jpeg | 7573 [154797824 - 154805396]
    24 | default | 18446187 [154805397 - 173251583]
    25 | jpeg | 797378 [173251584 - 174048961]
    26 | default | 59198 [174048962 - 174108159]
    27 | jpeg | 1016655 [174108160 - 175124814]
    28 | default | 10647881 [175124815 - 185772695]
    29 | jpeg | 102457 [185772696 - 185875152]
    30 | default | 122671 [185875153 - 185997823]
    31 | jpeg | 938643 [185997824 - 186936466]
    32 | default | 44397 [186936467 - 186980863]
    33 | jpeg | 1290134 [186980864 - 188270997]
    34 | default | 17769578 [188270998 - 206040575]
    35 | jpeg | 103522 [206040576 - 206144097]
    36 | default | 59294 [206144098 - 206203391]
    37 | jpeg | 932367 [206203392 - 207135758]
    38 | default | 497 [207135759 - 207136255]
    39 | jpeg | 3004310 [207136256 - 210140565]
    40 | default | 24004682 [210140566 - 234145247]
    41 | jpeg | 8194 [234145248 - 234153441]
    42 | bintext | 23070 [234153442 - 234176511]
    43 | jpeg | 990722 [234176512 - 235167233]
    44 | default | 15160158 [235167234 - 250327391]
    45 | jpeg | 9032 [250327392 - 250336423]
    46 | default | 23384 [250336424 - 250359807]
    47 | jpeg | 877947 [250359808 - 251237754]
    48 | default | 15224869 [251237755 - 266462623]
    49 | jpeg | 8366 [266462624 - 266470989]
    50 | default | 186290 [266470990 - 266657279]
    51 | jpeg | 1167939 [266657280 - 267825218]
    52 | default | 17959357 [267825219 - 285784575]
    53 | jpeg | 703899 [285784576 - 286488474]
    54 | default | 19122789 [286488475 - 305611263]
    55 | jpeg | 665830 [305611264 - 306277093]
    56 | default | 19156602 [306277094 - 325433695]
    57 | jpeg | 34005 [325433696 - 325467700]
    58 | default | 24833483 [325467701 - 350301183]
    59 | jpeg | 1325770 [350301184 - 351626953]
    60 | default | 78490 [351626954 - 351705443]
    61 | jpeg | 29399 [351705444 - 351734842]
    62 | default | 18363571 [351734843 - 370098413]
    63 | jpeg | 3188843 [370098414 - 373287256]
    64 | default | 57207 [373287257 - 373344463]
    65 | jpeg | 56994 [373344464 - 373401457]
    66 | default | 6774 [373401458 - 373408231]
    67 | jpeg | 6098520 [373408232 - 379506751]
    68 | default | 31480976 [379506752 - 410987727]
    69 | jpeg | 327074 [410987728 - 411314801]
    70 | default | 10390 [411314802 - 411325191]
    71 | jpeg | 7530480 [411325192 - 418855671]
    72 | default | 5218857 [418855672 - 424074528]
    73 | hdr | 513 [424074529 - 424075041]
    74 | dBase | 115382785 [424075042 - 539457826]
    75 | default | 13312499 [539457827 - 552770325]
    76 | jpeg | 717452 [552770326 - 553487777]
    77 | default | 24366686 [553487778 - 577854463]

    Segment data size: 1015 bytes

    TN |Type name |Count |Total size
    -----------------------------------------
    0 |default | 35 | 388244391
    1 |bintext | 3 | 29371906
    2 |dBase | 1 | 115382785
    3 |jpeg | 36 | 44620533
    4 |hdr | 2 | 4449
    10 |24b-image | 1 | 230400
    -----------------------------------------
    Total level 0 | 78 | 577854464

    default stream(0). Total 533003531
    jpeg stream(1). Total 44620533
    image24 stream(5). Total 230400
    Stream(0) compressed from 533003531 to 533003531 bytes
    Stream(1) compressed from 44620533 to 44620533 bytes
    Stream(5) compressed from 230400 to 230400 bytes
    Segment data compressed from 1015 to 1015 bytes
    Total 577854464 bytes compressed to 577855554 bytes.
    Time 769.19 sec, used 234 MB (246040919 bytes) of memory

  8. The Following User Says Thank You to Stephan Busch For This Useful Post:

    kaitz (2nd March 2018)

  9. #426
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    Quote Originally Posted by kaitz View Post
    has no format parser in pxd, so it detects whatever it thinks.
    RAW formats are a nightmare Kaido. The manufacturers believe in security by obscurity
    In EMMA, just for the parsers it's ~1.4k LOC, and then you still need specific models.
    And we'd have to rewrite the whole preprocessing stage to properly support them.

    Quote Originally Posted by kaitz View Post
    I need to test this new textmodel, and some other things. Takes forever to test
    Will try.
    You should try testing changes to cmix then, it's like watching paint dry

    You can probably strip the stemming code from the model, or does paq8pxd sometimes skip WRT for text files?

  10. #427
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    393
    Thanks
    148
    Thanked 222 Times in 121 Posts
    Quote Originally Posted by mpais View Post
    RAW formats are a nightmare Kaido. The manufacturers believe in security by obscurity
    In EMMA, just for the parsers it's ~1.4k LOC, and then you still need specific models.
    And we'd have to rewrite the whole preprocessing stage to properly support them.


    You should try testing changes to cmix then, it's like watching paint dry

    You can probably strip the stemming code from the model, or does paq8pxd sometimes skip WRT for text files?
    Did test it, not removed.
    I do minimal possible for RAW detection. To get jpeg files and if possible size of image data.

    cmix is another beast on its own
    KZo


  11. #428
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    393
    Thanks
    148
    Thanked 222 Times in 121 Posts
    KZo


  12. The Following User Says Thank You to kaitz For This Useful Post:

    Darek (4th March 2018)

  13. #429
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Here are scores for my testset. Similar like for paq8px line improving textual model hurts some nontextual files. Like mpais said - K.WAD doesn't like such changes.
    However there are some quite nice gain for text files.

    And there are one issue - for E.TIF file there are a quite big loss due to inproper recognise part of a file like text. E.TIF is an image compressed by LZW. Most of files can squeeze it a bit (max = 2.5%) and score paq8pxd v46 was similar but after wrong recognition as a text v47 got some backdraft. Of course E.TIF is an only example of case for probably more files.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v47.jpg 
Views:	54 
Size:	556.0 KB 
ID:	5805  
    Last edited by Darek; 4th March 2018 at 21:05.

  14. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (4th March 2018)

  15. #430
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Scores for 4 corpuses for v47 - little gains for Calgary and Canterbury testsets, some backdraft for Maximum Compression but very good gain for Silesia - mozilla file got 100'000 bytes less score!
    Second information - my scores for v47 on enwik8:
    16'080'717 - enwik8 -s15
    16'103'601 - enwik8.drt -s15
    looks like enwik9_1423 should be about 127'35x'xxx bytes.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v47_corpuses.jpg 
Views:	76 
Size:	810.9 KB 
ID:	5807  

  16. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (5th March 2018)

  17. #431
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Scores for v47 on enwik8 and ewik9:
    16'080'717 - enwik8 -s15
    16'103'601 - enwik8.drt -s15
    127'404'715 - enwik9_1423 -s15 - I've submitted last score to Matt on priv but there no response . Now this record should be posted.

  18. #432
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @Matt - could you add this submission to LTCB page?
    Paq8pxd_v47:
    enwik8 - 16'080'717 -> option -s15 - encode time: 7'432s, decode time: 7'627s, memory used: 27'500MB
    enwik9_1423 - 127'404'715 -> option -s15 - encode time: 75'022s, decode time: in progress, memory used: 27'500MB
    System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64, decompression in progress.
    Source code and 1423 resplit is attached in 7ZIP file = 139'841 bytes - Matt zip could be a little bigger but it's still 3'rd place in LTCB.

    Attached Files Attached Files

  19. #433
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @kaitz - will you plan to add preprocessing/recompression of images precompressed by RLE or LZW?
    From my testset there are two such files: D.TGA - RLE and E.TIF - LZW.

  20. #434
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    47
    Thanks
    3
    Thanked 7 Times in 7 Posts
    i have made a little improvement to paq8pxdv47 and this is the result
    xml -s14 253475
    anybody may test it for enwik8/enwik9 using -s15 option because i only have 16Gb memory. i have attached the source code and the binary. thank you
    Attached Files Attached Files

  21. #435
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @suryakandau@yahoo.co.id - could you also add image and jpg improvements from sim?

    xml -s15 253'470 bytes for paq8pxd_v47_suryakandau version
    xml -s15 253'855 bytes for original paq8pxd_v47 version

    I'll try to compress enwik8 by -s15

  22. #436
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @suryakandau I've tested your version for my testset, 4 corpuses and enwik8.
    There are some gains for textual files but also there are bigger loses for other, especially bigger files then total scores are weaker than original v47.

    enwik8 comparison:

    16'080'717 - enwik8 -s15 by Paq8pxd_v47
    16'103'601 - enwik8.drt -s15 by Paq8pxd_v47


    16'109'814 - enwik8 -s15 by Paq8pxd_v47_suryakandau - worse score for pure file
    16'101'169 - enwik8.drt -s15 by Paq8pxd_v47_suryakandau - small improvement to DRT file but still this score is worse than v47 pure.
    Last edited by Darek; 19th March 2018 at 12:03.

  23. #437
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    47
    Thanks
    3
    Thanked 7 Times in 7 Posts

    paq8pxdv47_1

    new improvement for xml and ooffice
    xml 256053
    ooffice 1382609
    maybe someone in this forum may implement LSTM in this version, i guess LSTM can improve very big ratio..
    Attached Files Attached Files

  24. #438
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    47
    Thanks
    3
    Thanked 7 Times in 7 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    new improvement for xml and ooffice
    xml 256053
    ooffice 1382609
    maybe someone in this forum may implement LSTM in this version, i guess LSTM can improve very big ratio..
    I use -s6 option.
    Paq8pxdv47 original -s6
    Xml 257102
    Ooffice 1386363

  25. #439
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @suryakandau - and what about loses for bigger files - as in previous your version? Did you check it?

  26. #440
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    47
    Thanks
    3
    Thanked 7 Times in 7 Posts
    Quote Originally Posted by Darek View Post
    @suryakandau - and what about loses for bigger files - as in previous your version? Did you check it?
    What file ?

  27. #441
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    What file ?
    from SILESIA: dickens, nci, webster
    from Maximum Compression: english.dic
    and enwik8

    But newer/today version looks better - 4 tests in progress but for my testset there is a slightly step forward especially for text files (in attached table). I suppose that for other testsets will be similar.

    16'083'438 - enwik8 -s15 by Paq8pxd_v47_suryakandau 2 - better than previous your update and very similar to original v47!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v47_su2.jpg 
Views:	76 
Size:	558.6 KB 
ID:	5855  

  28. #442
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    47
    Thanks
    3
    Thanked 7 Times in 7 Posts

    Smile

    Quote Originally Posted by Darek View Post
    from SILESIA: dickens, nci, webster
    from Maximum Compression: english.dic
    and enwik8

    But newer/today version looks better - 4 tests in progress but for my testset there is a slightly step forward especially for text files (in attached table). I suppose that for other testsets will be similar.

    16'083'438 - enwik8 -s15 by Paq8pxd_v47_suryakandau 2 - better than previous your update and very similar to original v47!
    I just improve textual n dll file in silesia.for bigger textual file like enwik8/9 i still don't test it...anyway thank you to test it

  29. #443
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    16'099'737 - enwik8.drt -s15 by Paq8pxd_v47_suryakandau 2

    What about image model improvements? Could you add it also to paq8pxd?
    Last edited by Darek; 23rd March 2018 at 11:34.

  30. The Following User Says Thank You to Darek For This Useful Post:

    suryakandau@yahoo.co.id (23rd March 2018)

  31. #444
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    And the scores for 4 corpuses.
    For smaller tests (smaller files) there are some gains but for Maximum Compression and Seilesia, especially for bigger files - there are loses which sum up to worse score...
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v47_su2_4_corpuses.jpg 
Views:	56 
Size:	775.1 KB 
ID:	5857  

  32. #445
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    47
    Thanks
    3
    Thanked 7 Times in 7 Posts
    Quote Originally Posted by Darek View Post
    16'099'737 - enwik8.drt -s15 by Paq8pxd_v47_suryakandau 2

    What about image model improvements? Could you add it also to paq8pxd?
    I could try it but I don't promise that the result is the same as the latest sim version because they have different structure...

  33. #446
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    some improvement on enwik8 compression using -s6
    v47 original 16.739.918 -
    my version 16.729.127
    i don't test using -s15 option, any volunteer may test it using -s15 on enwik8 and enwik9. maybe it could get third place in LTCB.
    Attached Files Attached Files

  34. #447
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    v47 original version already got third place on LTCB, however Matt didn't update LTCB page yet.
    See this post: https://encode.su/threads/1464-Paq8p...ll=1#post56169

    I would test -s15 option after paq8px v141 tests.

  35. #448
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    scores for enwik8:

    16'080'717 - enwik8 -s15 by Paq8pxd_v47
    16'103'601 - enwik8.drt -s15 by Paq8pxd_v47

    16'109'814 - enwik8 -s15 by Paq8pxd_v47_suryakandau
    16'101'169 - enwik8.drt -s15 by Paq8pxd_v47_suryakandau

    16'083'438 - enwik8 -s15 by Paq8pxd_v47_suryakandau 2
    16'099'737 - enwik8.drt -s15 by Paq8pxd_v47_suryakandau 2

    16'070'898 - enwik8 -s15 by Paq8pxd_v47_bwt - looks quite promising, maybe it could go 100KB less than v47 for enwik9
    16'106'855 - enwik8.drt -s15 by Paq8pxd_v47_bwt

  36. #449
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    376
    Thanks
    252
    Thanked 260 Times in 140 Posts
    Quote Originally Posted by bwt View Post
    some improvement on enwik8 compression using -s6
    Hello bwt,
    Is your improvement a general general one, or you targeted only enwik8/9?
    Please don't forget to include the source code, too (because of licensing - which is GPL).

  37. #450
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Until now my improvement just focus on enwik8/9. I will include the source code at the next version.

Page 15 of 22 FirstFirst ... 51314151617 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •