Page 78 of 78 FirstFirst ... 2868767778
Results 2,311 to 2,338 of 2338

Thread: paq8px

  1. #2311
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    @Darek, you can use your own usual dictionary file - the difference is only formatting. See my earlier post here.

    @Surya, as Darek is getting seriously interested in your improvement, I looked at it. Unfortunately buggy. The very same problem as you used to have in most of your releases. It even hits an assert. I'm going to fix it, and then it will be a v201fix1 indeed.

    @Darek:
    Give me a moment, I will post a fix and the desired exe.

    Edit: it will take more time. I verified only the lstm changes, and they didn't bring good results (just small fluctuations: some testfiles get slightly better some got slightly worse). I'll need more tests.
    Last edited by Gotty; 29th January 2021 at 01:57.

  2. Thanks (2):

    Darek (28th January 2021),LucaBiondi (29th January 2021)

  3. #2312
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Let's have a list of bugs/issues first.

    So it's about Surya's v201fix1 posted here.

    - You don't have a new changelog entry in the contribution. You should add one. Luckily we have a "list of changes" it in your post. It says: "i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs". When comparing the posted source code with the hxim repo it turns out (after some hacks to fix the line endings) that actually there are modifications in textmodel, too. Indeed in a later post we have got the info. Please include all the changes in the changelog file and try to include it in your post.
    - paq8px.cpp was not updated with the new version number.
    - In TextModel::mix - We see a new variable "uint64_t ii = State * 64;". The original variable (i) is the context counter. Using the new variable (ii) you have just restarted it. A context counter must not be restarted. Just simply continue using "i".
    - In TextModel::mix - Don't forget to remove your comments from the code that is not intended for publishing: "//185796"
    - In SimdLstmModel.hpp - there are 3 new lines. All of them have issues. The usual ones: bits of the different context constituents are overlapped (apm3 and apm4) or overflown (m.set). The first two are just "issues" but the latter is a bug.

    It's quite easy to fix the problems but verifying which brings the actual improvement or which is just noise I'll need to run many tests. Especially with the lstm model improvements it will take some time, so please be patient.

  4. Thanks (2):

    LucaBiondi (29th January 2021),suryakandau@yahoo.co.id (29th January 2021)

  5. #2313
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    217
    Thanks
    66
    Thanked 18 Times in 18 Posts
    @Gotty
    well, suryakandau REALLY SHOULD follow your advices and if he have an improvement, he really should post these in paq8sk thread and not here.
    Please hit the "THANKS" button under my post if its useful for you.

  6. #2314
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Quote Originally Posted by Gotty View Post
    That explained it. So fixing my detection routine jumped to no1 spot on my to do list. My next version will be about detections and tranforms anyway - as requested. It fits perfectly.
    Sorry, I just saw this reply now. If you're planning on refactoring the detection, I have an unreleased Fairytale prototype that might be of interest. It implements a hybrid pool of memory and physical storage to use as a scratch buffer for recompression using a single allocated memory block and a single temporary file, so it fixes the main problem from the first prototype. Writing parsers for it is also much easier than for paq8px since it solves the contention problems we get, so no more of this:


    if (!gifi && !bmpi && !tgai && ...)

  7. Thanks:

    LucaBiondi (31st January 2021)

  8. #2315
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    You can also look at paq8pxv detection. Maybe it helps. There is no negative test for somethig detecting at the same time. Not sure if its same as above.
    KZo


  9. #2316
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Quote Originally Posted by mpais View Post
    I have an unreleased Fairytale prototype that might be of interest.
    I would be very glad! I haven't started integrating any pxd transforms yet, also didn't fix any bugs yet. So feel free to twist it and turn it. It will be a big change I believe.

  10. Thanks:

    LucaBiondi (3rd February 2021)

  11. #2317
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 833 Times in 341 Posts
    It might be overkill, since it would likely require a big rewrite.

    After calling it, you're left with the full block segmentation (including deduped blocks).

    Since it uses a user definable scratch buffer, some content may not be recompressible within those limitations, so it would be skipped (e.g., a DEFLATE stream that expands to 3GB but we only allow usage of 2GB for temporary storage).

    It also requires using its "hybrid" streams, which hold the block content in memory and/or physical storage, and might have been evicted from the scratch buffer to make way for other streams, so they need to be "revived".

    If you're interested I can go into more detail, though a lot of it was already discussed here and on Gitter at the time.

    My GDCC entries were prototypes of ideas I had for some codecs for it, the main goal for me was to have a framework that handled detection and clustering so that with specialized codecs one could get compression ratios close to paq8px but at 2-3 MB/s on contemporary hardware.

  12. Thanks:

    Mike (6th February 2021)

  13. #2318
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    @mpais: Any chance of releasing it?

  14. #2319
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Or maybe start a Fairytale fork for PAQ8PX and slowly merge it with the main branch?

  15. #2320
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Quote Originally Posted by Gonzalo View Post
    @mpais: Any chance of releasing it?
    Sure, I just need to clean it up a bit. Do you guys want a little example of how to use it, like we did with the previous prototypes, so you can test it?

    Quote Originally Posted by moisesmcardona View Post
    Or maybe start a Fairytale fork for PAQ8PX and slowly merge it with the main branch?
    That's something that has crossed my mind - just skip all the extra complexity of the Fairytale project and simply use this to build a new, no-frills CM compression engine.
    I'd call it "paqx" as a way to continue the legacy of the name, and it seems like the logical next step, since paq9 already exists.

    Any thoughts?

  16. Thanks (2):

    LucaBiondi (13th February 2021),moisesmcardona (12th February 2021)

  17. #2321
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    Quote Originally Posted by mpais View Post
    Sure, I just need to clean it up a bit. Do you guys want a little example of how to use it, like we did with the previous prototypes, so you can test it?
    That'd be great, yes! I don't think it'll be too difficult to figure out how to test it, but the author's recommendations are always appreciated.

  18. Thanks:

    LucaBiondi (13th February 2021)

  19. #2322
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 833 Times in 341 Posts
    The code is now live on GitHub.

    Quote Originally Posted by Gonzalo View Post
    That'd be great, yes! I don't think it'll be too difficult to figure out how to test it, but the author's recommendations are always appreciated.
    It's pretty similar in usage to previous prototypes, but the repo doesn't include a usage sample, it's just the library code.
    I also didn't port all the parsers, and haven't really looked at the code in a really long time, so it may not be up to snuff.

    I'll see if I can cook up something when I have some time so it can be tested.

    Let me know what you guys think.

  20. Thanks (3):

    Gonzalo (12th February 2021),LucaBiondi (13th February 2021),Mike (12th February 2021)

  21. #2323
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 833 Times in 341 Posts
    I made a really simple example of how to use the analysis stage, for those who tested the previous prototypes it should be very familiar.
    Attached Files Attached Files

  22. Thanks (2):

    Darek (13th February 2021),LucaBiondi (13th February 2021)

  23. #2324
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Quote Originally Posted by mpais View Post
    no-frills CM compression engine.
    I'm not sure - what does it mean? Could you describe the idea a bit deeper?

  24. #2325
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Quote Originally Posted by Gotty View Post
    I'm not sure - what does it mean? Could you describe the idea a bit deeper?
    Fairytale was supposed to be an extremely complex and powerful archiver, with a lot of features that don't make much sense if you just want to make an experimental CM compressor ala paq8.

    After the initial analysis and content-based deduplication stage, it'd optionally run an additional classic deduplication stage on default blocks, followed by an optional clustering and similarity sorting stage to determine the order in which the blocks would actually get solidly compressed. All this was designed to improve compression ratio even when using simpler codecs, like zstd, brotli, lzma, etc.

    You could choose the codec sequence to apply to each block type, e.g., for 24bpp images you could chain a fast whole image decorrelation filter with zstd, which would give you better compression than even the most optimized PNGs while retaining the very fast decompression provided by zstd. And anyone here with an idea for such a filter could quickly implement it as a codec in a Fairytale fork and see the results, without having to write any parsers, entropy stages or code for archiving functionality.

    By spending more time when compressing on performing better data segmentation, you can then use highly specialized methods for each type of data to get much better compression ratios with minimal or even no extra cost in terms of decompression performance.

    Now, if your interest is just in pushing the envelope in terms of compression ratio, you might not really care for all of this, since you'll probably be using so much memory and such complex models that the gains from all this extra complexity may be residual at best.

  25. Thanks:

    Gotty (27th February 2021)

  26. #2326
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    enwik scores for paq8px v201:
    15'896'588 - enwik8 -12leta by Paq8px_v189, change: -0,07%
    15'490'302 - enwik8.drt -12leta by Paq8px_v189, change: -0,10%
    121'056'858 - enwik9_1423.drt -12leta by Paq8px_v189, change: -2,99%

    15'884'947 - enwik8 -12lreta by Paq8px_v193, change: -0,02%
    15'476'230 - enwik8.drt -12lreta by Paq8px_v193, change: -0,02%
    126'066'739 - enwik9_1423 -12lreta by Paq8px_v193, change: -0,09%
    121'067'259 - enwik9_1423.drt -12lreta by Paq8px_v193, change: 0,08%

    15'863'690 - enwik8 -12lreta by Paq8px_v201, change: -0,23% - time to compress: 45'986,20s
    15'462'431 - enwik8.drt -12lreta by Paq8px_v201, change: -0,12% - best score for paq8px series- time to compress: 30'951,71s
    120'921'555 - enwik9_1423.drt -12lreta by Paq8px_v201, change: -0,13% - best score for paq8px series- time to compress: 406'614,43s

  27. Thanks:

    Gotty (27th February 2021)

  28. #2327
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Quote Originally Posted by Darek View Post
    OK, I'll try. Somehow I have some issues with my laptop recently - I've started 5 times enwik9 for paq8px v201 - and even after 3-4 days my computer crashes...
    Does it happen around the same %? How does it crash? Blue screen?


  29. #2328
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Gotty View Post
    Does it happen around the same %? How does it crash? Blue screen?
    No, there were different moments.
    It's probably something with my laptop. I suppose that could be issue of low space on Disk C (system).
    I need to sometime close the laptop and hibernate system and for some cases, after standing up system, paq8px quits w/o any communicate.
    At now I started to plan compression for time when there won't be needed to hibernate the system. For enwik9 it's about 4 days of time which need to be not disturbed.

  30. #2329
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    112
    Thanks
    139
    Thanked 51 Times in 31 Posts
    Hi guys
    i have an excel that contain the results of the compression of my dataset starting from PAQ8PX_V95(!)
    i have also plotted for each datatype size / time.
    It's not easy to attach all these images to the post.
    Where could i upload my excel?
    Maybe someone can help to plot data in a better way..
    thank you,
    Luca

  31. #2330
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Hi Luca!
    You can upload the excel itself (in a zip or 7zip).

  32. #2331
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    112
    Thanks
    139
    Thanked 51 Times in 31 Posts

    Excel with all paq8px results for my dataset

    Quote Originally Posted by Gotty View Post
    Hi Luca!
    You can upload the excel itself (in a zip or 7zip).
    Done! i hope you enjoy the graphs size vs. time
    Cartel1.zip

  33. Thanks:

    Darek (5th March 2021)

  34. #2332
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    180
    Thanks
    61
    Thanked 52 Times in 41 Posts
    This is brutal. Dividing the last row of the numbers by the first show that the few percents of compression ratio improvements cost x 5 to x 15 in compression time.

  35. Thanks:

    mitiko (Yesterday)

  36. #2333
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by LucaBiondi View Post
    Done! i hope you enjoy the graphs size vs. time
    Cartel1.zip
    @LucaBiondi - maybe I'm not the master of Excel but I could help you to made some changes

  37. #2334
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    112
    Thanks
    139
    Thanked 51 Times in 31 Posts
    Oh yes, help me @Darek please!
    Luca
    Last edited by LucaBiondi; 5th March 2021 at 12:13.

  38. #2335
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Quote Originally Posted by hexagone View Post
    This is brutal. Dividing the last row of the numbers by the first show that the few percents of compression ratio improvements cost x 5 to x 15 in compression time.
    It looks like the timing is not reliable - look at the last 2 rows: (Paq8px_v201 -10 / Paq8px_v201 -11).
    In the xml column they are the same (to the hundredth), which is probably an error. Ignoring this column the ratio still varies between 1.29 and 4.29. That's a very large spread.

    Luca was probably running multiple instances at the same time or run other tasks beside paq8px. So it's best to ignore the timings. They are not really useful.

    If you still would like to compare timings, you may need to clean the results by scaling the outliers but most importantly compare the results along the same command line options. You may compare versions along the compression level (8 to 8, 9 to 9, etc), and especially if lstm was used or not used.

    I'd say the "slowdown" from v95 to v201 is probably about 3x.

    Yes it's still a lot. But for 3-4% compression gain it worth it.

  39. #2336
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    180
    Thanks
    61
    Thanked 52 Times in 41 Posts
    So it's best to ignore the timings. They are not really useful.
    You mean 'accurate' I assume. Accurate timings are useful.

    You may compare versions along the compression level (8 to 8, 9 to 9, etc), and especially if lstm was used or not used.
    Or just compare the option with the highest compression for each release.

    ‚Äč
    I'd say the "slowdown" from v95 to v201 is probably about 3x.

    Yes it's still a lot. But for 3-4% compression gain it worth it.
    3% improvement for a x 3 compression time is incredibly expensive.
    "Worth" is in the eye of the beholder...

  40. #2337
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    746
    Thanks
    424
    Thanked 487 Times in 261 Posts
    Quote Originally Posted by hexagone View Post
    3% improvement for a x 3 compression time is incredibly expensive.
    "Worth" is in the eye of the beholder...
    Yes. In this league there is a serious "fight" for each byte gained. Each version is better like around 0.02%-0.10%, and for that tiny improvement there is a lot going on. The result is extremely "tight" and optimized already - it's not really possible to "tune" the existing models and contexts for any more gains, we (usually) need to add more models to experience better gains. That costs time and memory. The gain is not too great usually as the existing models cover the possibilities quite well.
    We really sacrifice memory and speed for compression ratio.
    So in this league 3% improvement is a lot - and it costs a lot.

  41. #2338
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    112
    Thanks
    139
    Thanked 51 Times in 31 Posts
    Hi guys
    Thanks to all for the comments.
    I run paq8px besides the normal work.
    My pc has 16 gb of ram and often ram is not enough.
    This happen for 201 -10 vs. 201 -11
    Luca

Page 78 of 78 FirstFirst ... 2868767778

Similar Threads

  1. FrontPAQ - GUI frontend for PAQ8PF and PAQ8PX
    By LovePimple in forum Download Area
    Replies: 26
    Last Post: 17th January 2019, 14:36
  2. Alternative paq8px builds
    By M4ST3R in forum Download Area
    Replies: 20
    Last Post: 25th June 2010, 17:19
  3. Optimized paq7asm.asm code not compatible with paq8px?
    By M4ST3R in forum Data Compression
    Replies: 7
    Last Post: 3rd June 2009, 16:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •