Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • LucaBiondi's Avatar
    Today, 01:05
    LucaBiondi replied to a thread paq8px in Data Compression
    Thanks Gotty ...ready, just started to test!:_cool2: Luca
    1707 replies | 478691 view(s)
  • kaitz's Avatar
    Yesterday, 23:17
    kaitz replied to a thread paq8px in Data Compression
    IMG080.jpg (967711 bytes) paq8px_182.fix1 -8 737230 Time 23.43 sec, used 2372 MB (2487680938 bytes) of memory paq8px_v182fix2 -8 736736 Time 21.69 sec, used 2372 MB (2487680938 bytes) of memory paq8pxd_v68_AVX2 -s8 736627 Time 19.29 sec, used 2209 MB (2316655105 bytes) of memory
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    Yesterday, 22:42
    Gotty replied to a thread paq8px in Data Compression
    Aham, that helps indeed (with larger jpegs), and it's logical, too! Going in the next release. Thanx so much! ;-) Luca will be happy. "Preview" attached. Luca, it's all yours.
    1707 replies | 478691 view(s)
  • kaitz's Avatar
    Yesterday, 22:16
    kaitz replied to a thread paq8px in Data Compression
    In SSE class like this: case JPEG: { pr = pr0; break; } In pxd i dont have final APM, it really hurts compression.
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    Yesterday, 22:00
    Gotty replied to a thread paq8px in Data Compression
    Line 11105 in v182fix1? pr = (pr+pr0+1)>>1; Hmmm.. It's worse if a remove it (just tested with smaller and larger files as well). Is this the line you meant? Edit: @Luca: I tested it on your 3 large files :-) Of course. That is my large test set :-)
    1707 replies | 478691 view(s)
  • LucaBiondi's Avatar
    Yesterday, 21:51
    LucaBiondi replied to a thread paq8px in Data Compression
    If you want add an option i will be happy to test it! Luca
    1707 replies | 478691 view(s)
  • kaitz's Avatar
    Yesterday, 21:41
    kaitz replied to a thread paq8px in Data Compression
    More ... :D JPEG -> what if you removed final APM in SSE class for JPEG. Will compression be better?
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    Yesterday, 20:28
    Gotty replied to a thread paq8px in Data Compression
    Thanx! I have it on my to do list for a long time - since Darek suggested, and you gave these hints.
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    Yesterday, 20:22
    Gotty replied to a thread paq8px in Data Compression
    I noticed that when you posted the results last time and it matched to my results exactly (I also run tests on level -8 ) - except for some files where I used "-a" (adaptive learning rate). We are on the same wavelength.
    1707 replies | 478691 view(s)
  • kaitz's Avatar
    Yesterday, 20:13
    kaitz replied to a thread paq8px in Data Compression
    nci improvement comes from wrt filter, as all other large files. DEC Alpha main improvement comes mostly from byte order swap and call filter. osdb improvment comes from Wordmodel i think, cant remember what context/check it was. not sure about others. As for testing with option -t I always test without it on files. At least when comparing with pxd versions.
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    Yesterday, 19:53
    Gotty replied to a thread paq8px in Data Compression
    Yes, you are absolutely right! Fix1 contains the extended text-pretraining. Any pre-training helps only during the first some kilobytes (of text files of course), when the NormalModel and WordModel of paq8px doesn't know anything about words and their morphology. As soon as the NormalModel and WordModel has learnt enough from the real data the effect of pre-training fades away and the model takes on. It means that the larger the file, the less text-pretraining helps proportionally. I don't know when it happens, but your feeling of 100K-200K seems right. The truth is: text-pretraining is not advantageous. Look: paq8px_v182fix1 -9a : 16'456'404 (no pre-training) paq8px_v182fix1 -9at: 16'411'564 (with pre-training) The difference is 41'840 bytes In order to decompress you'll need paq8px_v182fix1.exe (it's size must be added to the size of both results), and for the second case with pre-training, you'll need the pre-training files as well. So how large are they? Let's see. paq8px_v182fix1 -9a: 109'701 (input file is a list file containing: english.dic, english.emb, english.exp) 16'411'564+109'701= 16'521'265 We lost 64'861 bytes! The result without pre-training is better! I suggest that we don't use pre-training at all in any benchmarks - or when we use pre-training we must add the compressed size of the pre-training files to the final result. If we don't take into account these files, the result gives us a false sense that paq8px has beaten cmix. I suppose if you don't use pre-training neither for paq8px nor for cmix, cmix would still beat paq8px. When you have some time, could you run a test only on the files in your testset where paq8px has beaten cmix? I wounder what the results would be.
    1707 replies | 478691 view(s)
  • maadjordan's Avatar
    Yesterday, 17:27
    maadjordan replied to a thread smpdf in Data Compression
    It seems that unicode file name support was missed during compilation. A temporary windows compile is available through this link http://www.coherentpdf.com/16thSeptember2019.zip
    7 replies | 2968 view(s)
  • boxerab's Avatar
    Yesterday, 15:42
    Cool, thanks @pter : Let the HTJ2K vs XS battle begin !
    28 replies | 3391 view(s)
  • Darek's Avatar
    Yesterday, 12:00
    Darek replied to a thread paq8px in Data Compression
    Scores of 4 Corpuses for paq8px v182fix1 - amazing improvements especially for smaller files (Calgary and Canterbury corpuses) - almost all the best scores for paq8px and the biggest gain I've ever seen for paq8px during the next versions (0.8-0.9%)! For ob1, progb, progc (Calgary), fields.c, grammar.lsp, sum, xargs.1 (Canterbury), FlashMX.pdf (MaximumCompression) this version have the best overall scores and beat latest cmix v18 version! I have one insight (maybe I'm wrong) but most of changes on fix1 versions gives 200-500 bytes of gain independent to file size (it's similar on R.DOC and G.EXE or even smaller for K.WAD) - it looks like this improvement works only or mostly for first 100-200KB or I'm wrong... One tip to more improve on Silesia corpus (I know it's tuning mostly for this corpus) -> there are some changes in paq8pxd version by Kaitz which a) adds DEC Alpha parser/model - it's gives about 500KB of gain on Mozilla file. b) There are model which gives about 60KB of gain for NCI file. Files oofice, osdb and x-ray are also better compression but maybe it's specific for this version of paq. Additionaly there are scores of enwik8 and enwik9 for paq8px v182 (w/o fix yet): 16'838'907 - enwik8 -s7eta -Paq8px_v182 16'435'259 - enwik8.drt -s7eta -Paq8px_v182 16'428'290 - enwik8 -s9eta -Paq8px_v182 16'086'695 - enwik8.drt -s9eta -Paq8px_v182 133'672'575 - enwik9 -s9eta -Paq8px_v182 129'948'994 - enwik9.drt -s9eta -Paq8px_v182 133'591'653 - enwik9_1423 -s9eta -Paq8px_v182 - best score for all paq8px version (except paq8pxd) 129'809'666 - enwik9_1423.drt -s9eta -Paq8px_v182 - best score for all aq8px version (except paq8pxd)
    1707 replies | 478691 view(s)
  • Krishty's Avatar
    Yesterday, 09:05
    I forgot … there is one thing you could help me with. I see that genetic filtering is implemented in lodepng’s encoder, which seems to run after Zopfli. If so, what are the reasons for running it *after* deflate optimization instead of before – wouldn’t that affect compression negatively, especially block splitting?
    415 replies | 104553 view(s)
  • pter's Avatar
    Yesterday, 06:17
    pter replied to a thread JPEG 3000 Anyone ? in Data Compression
    The HTJ2K (ISO/IEC 15444-15 | ITU T.814) specification has been published and is available free of charge at: https://www.itu.int/rec/T-REC-T.814/en
    28 replies | 3391 view(s)
  • Krishty's Avatar
    Yesterday, 00:57
    Yes, but I didn’t get to the actual tests yet because I wanted to isolate the deflate part first. I’ll let you know once I have the results! Sorry if I was unclear – with -60 and -61 I mean -10060/-20060/-30060/etc. It would be a pity to remove those as the fun starts at -xxx11 and the sweet spot for maximal compression seems to be at -xxx30 to -xxx60 :) Yes, that is absolutely right and it’s absolutely possible that my test set was just bad. However, looking at ECT’s PNG performance – where it is almost never beaten, Leanify being not even close – that could imply some sort of error (if the benchmarks turn out to be valid, again). Sorry, I should rather have expressed this as “TODO for me to check out” rather than “questions” … I’m trying not to bother you with guesses here, rather trying to find out what’s going on in my tests and documenting it for others in case it’s useful to them :)
    415 replies | 104553 view(s)
  • fhanau's Avatar
    Yesterday, 00:17
    1. -3 does not perform substantially better than -4 in my tests. Have you considered to use a different test set? 2. -60 and -61 are not supported options. In a future version ECT will reject those arguments so questions like these don't come up anymore. 3. That depends on the settings used for the tools and the files contained in the zip. ECT was mostly tuned on PNG and text files. On the example you provided, ECT does nineteen bytes worse than leanify, I think occasionally doing that amount worse is acceptable.
    415 replies | 104553 view(s)
  • Krishty's Avatar
    Yesterday, 00:02
    Great work, thanks a lot! Guess I’ll do some tests anyway, just out of curiosity :) This helps me a lot to get a high-level overview, thanks. So – just to establish a check point here – my open questions with ECT are: Why does -3 perform substantially better than -4 or any higher levels? I know so far: it’s a filter thing (it does not show in my deflate-only benchmarks) a workaround is using the --allfilters option it could be rooted in OptiPNG How can -61 sometimes take a thousand times longer than -60? (Not -62 vs -61, sorry for the error in my previous post!) definitely a deflate thing; ECT-only could be related to long runs of identical pixels (does not show with Lenna & Co., but with comics and renderings) How can Leanify & advzip outperform ECT on ZIP files when my benchmarks show such a high superiority of ECT with PNGs? I’ll try to find answers in subsequent benchmarks …
    415 replies | 104553 view(s)
  • fhanau's Avatar
    15th September 2019, 23:10
    This is a simple heuristic that tunes the LZ cost model based on the results gained from running lazy LZ first when we only have time for one iteration. It is only enabled for PNG when using a low compression level, where it really helps in making ECT with one iteration competitive.
    415 replies | 104553 view(s)
  • fhanau's Avatar
    15th September 2019, 23:06
    I wrote most of ECT years ago, but it mostly comes down to performance improvements in pretty much every part of ECT's deflate, much better caching, a new match finder and better handling of the iterative cost model.
    415 replies | 104553 view(s)
  • MegaByte's Avatar
    15th September 2019, 21:20
    Some of ECT's filtering code was written by me -- including a genetic algorithm inspired by PNGwolf (as long as you activate it) but with better overall performance especially due to better seeding from the other filter methods. I don't expect PNGwolf to win in any cases currently. A number of the other filter algorithms were inspired by Cedric Louvier's post about TruePNG. Since that time, he wrote pingo, which does many of those filters much more efficiently than the brute-force methods included in the ECT code.
    415 replies | 104553 view(s)
  • Krishty's Avatar
    15th September 2019, 16:02
    Me as well. Unfortunately, no clue. ECT’s source code is very different, and for example in squeeze.c I see vast floating-point math on symbol costs with comments like: Sorry, but this is the first time I look into compression code; even plain zlib is still overwhelming to me and ECT looks like a master or doctor thesis to me. Maybe Felix could elaborate on that? (Also, I get carried away from the original question – whether ECT’s filtering is better than PNGwolf’s :) )
    415 replies | 104553 view(s)
  • maadjordan's Avatar
    15th September 2019, 15:07
    maadjordan replied to a thread smpdf in Data Compression
    CPDF v2.3 has been released. https://coherentpdf.com/blog/?p=92 bin for Win,Mac & Linux : https://github.com/coherentgraphics/cpdf-binaries
    7 replies | 2968 view(s)
  • Jyrki Alakuijala's Avatar
    15th September 2019, 14:40
    Do we know why? Better block split heuristics? I'd love to see such improvements integrated into the original Zopfli, too.
    415 replies | 104553 view(s)
  • Krishty's Avatar
    15th September 2019, 13:11
    In order to get the Deflate benchmarks more fair, I downloaded all compressors I know, compiled them on Windows for x64, and ran them. All sample images had row filtering entirely disabled (filter type zero) and were compressed with the Z_STORE setting to avoid bias if tools want to re-use compression choices from original input. The tests typically take a day or two, so there’s just a few data points so far: Lenna, Euclid, PNG transparency demonstration; all shown below. We’re looking at a very tight size difference here (often just at a per mille of the image). The size differences are really small. First, it can definitely be stated that ECT’s Zopfli blows everything else away. For little compression, it’s always several times faster than the Zopfli variantes. For long run times, it constantly achieves higher compression ratios. So high that often the worst run of ECT compresses better than the best run of any Zopfi-related tool. But ECT has some weird anomaly where more than 62 iterations where sometimes it becomes incredibly inefficient and suddenly takes ten or thousand(!) times longer to run than 61 or less iterations. This can be seen clearly on Euclid, but it is worse on transparency where I had to omit all runs above 61 iterations because the run-time jumped from twelve seconds to 24,000 (a two-thousand-fold increase)! Second, advpng’s 7-Zip seems to be broken. You don’t see it in the benchmarks because it compresses so bad that it didn’t make it into any of the graphs. It’s constantly some percent(!) worse than Zopfli & Co and I just can’t believe that. There has to be a bug in the code, but I couldn’t investigate that yet. Now, Zopfli. Advpng made very minor adjustments to its Zopfli (or it is just an outdated version?) and apart from the higher constant overhead, it’s basically the same. Leanify’s Zopfli has had some significant changes. It sometimes compresses better, sometimes worse. But on low compression levels, it often compresses better. The one problem I see with ECT is that its performance is almost unpredictable. Though better than Zopfli, the difference from -10032 to -10033 can be as large as the difference between Zopfli and ECT. This will be a problem with my upcoming filter benchmarks. I should check whether it smoothes when I apply defluff/DeflOpt to the output. Input images are attached.
    415 replies | 104553 view(s)
  • Krishty's Avatar
    14th September 2019, 22:01
    Fixed. A few years ago, I wrote a custom PNG variation with PPMd instead of Deflate which worked pretty well with the expand for 7z function in my optimizer. However, I ditched it because non-standard formats are pretty much useless. Now I’m investigating ECT’s efficiency. Nothing else comes to my mind right know. The Optimizer has a (non-critical) memory problem with GIF optimization. FlexiGIF outputs a *lot* of progress information, sometimes as much as a GiB over a few days of run-time. Optimizer keeps all that (needlessly) in memory. I’ll fix that for the next version.
    18 replies | 864 view(s)
  • Krishty's Avatar
    14th September 2019, 21:54
    Krishty replied to a thread FileOptimizer in Data Compression
    I noticed that the specific order of operations in Papa’s often yields 18-B savings over almost all other JPEG optimizers, but I didn’t have any time yet to investigate the cause. In case anyone bothers to find out, I attached Papa’s JPG handling code. I’d be glad to learn what causes this gain because I’m sure it can be reached more efficiently!
    652 replies | 185730 view(s)
  • CompressMaster's Avatar
    14th September 2019, 20:52
    @Krishty, 1,By attaching, I mean your 1st post. Could you repair that? Thanks. 2,What other unpublished stuffs do you have? (compression field)
    18 replies | 864 view(s)
  • maadjordan's Avatar
    14th September 2019, 17:34
    maadjordan replied to a thread 7-zip plugins in Data Compression
    New Plugin Added: ExFat7z
    1 replies | 391 view(s)
  • Krishty's Avatar
    14th September 2019, 16:46
    For pixels, yes. Metadata not.
    18 replies | 864 view(s)
  • necros's Avatar
    14th September 2019, 08:08
    necros replied to a thread FileOptimizer in Data Compression
    why Papa`s optimizer output smaller-sized jpgs by default than FO. Not great size difference but still.
    652 replies | 185730 view(s)
  • necros's Avatar
    14th September 2019, 07:30
    necros replied to a thread Papa’s Optimizer in Data Compression
    is BMP, GIF to PNG conversion lossless?
    18 replies | 864 view(s)
  • Gonzalo's Avatar
    13th September 2019, 18:10
    Hopefully this is the first step towards mass production and the reduction of costs.
    2 replies | 83 view(s)
  • pklat's Avatar
    13th September 2019, 15:00
    pklat replied to a thread 7-Zip in Data Compression
    could it perhaps use CUDA ?
    545 replies | 287504 view(s)
  • Darek's Avatar
    13th September 2019, 12:20
    Darek replied to a thread paq8px in Data Compression
    Scores for my testset for paq8px v182fix1 - very good work for smaller files - > average gain for my textual files is on the level of 2.1%! That means my testset textual files lose now to best cmix (v17) only 1.1%! It's very, very close now. This version also get best overall scores (and beat best cmix scores) for O.APR, T.DOC and Y.CFG!
    1707 replies | 478691 view(s)
  • schnaader's Avatar
    13th September 2019, 10:42
    schnaader replied to a thread paq8px in Data Compression
    Good work on this! Tried to do this as a separate transformation tool recently, and it didn't work out. Problems were: The length of the strings between brackets had to be coded, so e.g. replacing them with spaces does this, but adds information instead of just moving the text. Also, for many files, the kerning numbers between the brackets correlate with the previous and next character in the brackets, so separating them hurts compression. So context modelling is the way to go here. Another PDF thing that is relatively easy to implement would be xrefs. This is the big table at the end of PDF files. It encodes all the offsets of "x 0 obj" (where x is a incrementing number) and sorting this list by x leads to the xref table (although there can be some deleted objects between entries that don't appear in the previous part). Not a big saving as xref tables compress well anyway, but some KB per PDF.
    1707 replies | 478691 view(s)
  • Darek's Avatar
    13th September 2019, 10:26
    Darek replied to a thread lstm-compress in Data Compression
    Here are lstm-compress v3b scores for my testset. As total = -1.32% of gain but some files, especially small texual files got two-digit gains! It's my best option now without any optimizations which could give some additional gains. In the second table there are comparison of lstm-compress v3b to latest NNCP rc1 scores.
    74 replies | 8207 view(s)
  • Gotty's Avatar
    13th September 2019, 08:06
    Gotty replied to a thread paq8px in Data Compression
    Paq8px_v181 -9ta 16'446'172 Paq8px_v182 -9ta 16'428'290 Paq8px_v182fix1 -9ta 16'411'564
    1707 replies | 478691 view(s)
  • LucaBiondi's Avatar
    13th September 2019, 00:14
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi! Gotty this time you have got a great great job! wow! Theese are results from my big testset V181 vs V182 JPEG achieve 10 KB! PDF gain 134 KB! TXT gain 30KB! ISO gain 50KB! MP3 AND XML loose some. New Overall record! New record for PDF, MP4,TXT, BAK, EXE and ISO files! Thank you!!! Luca
    1707 replies | 478691 view(s)
  • encode's Avatar
    12th September 2019, 23:49
    encode replied to a thread CHK Hash Tool in Data Compression
    Thank you! :-) Please note - don't bother translating current version as of now. A new version will have new strings and incompatible lang.txt file format.
    203 replies | 78573 view(s)
  • Gotty's Avatar
    12th September 2019, 21:31
    Gotty replied to a thread paq8px in Data Compression
    - Text pre-training is applied to WordModel - Fixed 32-bit compilation issue (_stat64i32->_stat) No change in compression except when using text pre-training: -t So the issue with text training in v182 was that I removed a "text" context from NormalModel, and merged it (along with the DistanceModel) to WordModel. But then text training was not applied any more to this context and in most cases compression of text files degraded significantly when running with text pre-training. With this minor version I have a simple fix, I apply text pre-training not only to NormalModel, but to WordModel as well. This will not only fix the issue of v182, but will give additional boost when compressing text files with pre-training.
    1707 replies | 478691 view(s)
  • Darek's Avatar
    12th September 2019, 20:35
    Darek replied to a thread paq8px in Data Compression
    Good idea. I didn't think about it. For me it looks much more reasonable.
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    12th September 2019, 18:49
    Gotty replied to a thread paq8px in Data Compression
    How about measuring enwik7 runtimes for all versions, and enwik8 runtimes only for some versions (maybe for the oldest, the newest and one in between)? This way we can calculate/interpolate an approximated runtime for the missing enwik8 ones? Anyway, we don't need precise measurements, just to be able to plot a curve.
    1707 replies | 478691 view(s)
  • Darek's Avatar
    12th September 2019, 12:57
    Darek replied to a thread paq8px in Data Compression
    > As far as I know Darek also runs tests in parallel. Yes, then most of times are not reliable due to this or the fact that I've changed my laptop, but I could test it in free time. Now I've started to test enwik9 for paq8px v182 - after that I could test some entries - if you have some propositions then let me know. Otherwise I'll choose some versions to test. DRT version of enwik8 was tested by paq8px v182 in 8k sec then It would be about 11-13k sec by one test of pure file = 6-8 entries by day.
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    12th September 2019, 09:41
    Gotty replied to a thread paq8px in Data Compression
    It's true. Although I have runtimes in the logs paq8px records during compression, unfortunately they are unreliable: I always run tests in parallel or sometimes pause one or more of the command line windows to give the others some air. My runtimes are useless. As far as I know Darek also runs tests in parallel. Most of the above results came from him. I could reproduce the results with runtimes on an idles system, but I don't have any idle system ;-) Thanx! Awaiting for the results! Thanx for the info, I'll look into it. A 32-bit executable would be very slow compared to a 64-bit one due to emulated 64 bit operations (there are a lot 64-bit multiplications in hashing for example). Also I'm not sure if level -8 and -9 would work as expected: they need 2GB+ RAM. I hope you don't do anything serious with a 32 bit executable.
    1707 replies | 478691 view(s)
  • Jarek's Avatar
    12th September 2019, 07:28
    JPEG XL next-generation image compression architecture and coding tools: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11137/111370K/JPEG-XL-next-generation-image-compression-architecture-and-coding-tools/10.1117/12.2529237.full Assessment of quality of JPEG XL proposals based on subjective methodologies and objective metrics: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11137/111370N/Assessment-of-quality-of-JPEG-XL-proposals-based-on-subjective/10.1117/12.2530196.short?SSO=1 update: PDF of the latter: https://infoscience.epfl.ch/record/270332
    43 replies | 6009 view(s)
  • Shelwien's Avatar
    12th September 2019, 03:06
    Yeah, I think this kind of hardware (TPU) has potential to turn paq-like compression algorithms into something practical. Unfortunately cerebras thing specifically is too expensive - $5k just for the wafer, ~$600k estimated cost for the whole thing.
    2 replies | 83 view(s)
  • Gonzalo's Avatar
    12th September 2019, 02:22
    Quoting Forbes here: "With 400,000 programmable processor cores, 18 GB of memory, and an on-chip fabric capable of 25 petabits, the WSE comprises 1.2 trillion transistors in 46,225 mm2 of silicon real estate (for contrast, it is 56x larger than the largest GPU for AI, which is 815mm2)" "On top of these engineering innovations, the company develop new programmable Sparse Linear Algebra Cores (SLAC) optimized for AI processing. The SLAC skips any function that multiplies by zero, which can significantly speed the multiplication of matrices in the deep learning process while reducing power. The company also reduced the memory stack by eliminating cache and putting large amounts of high-speed memory (18 GB of SRAM) close to the processing cores. All this is connected by what the company calls the Swarm communication fabric, a 2D mesh fabric with 25 petabits of bandwidth that is designed to fit between the processor cores and tiles, including what would normally be die cut area on the wafer." "Because of its design, the Cerebras WSE platform has advantages in latency, bandwidth, processing efficiency, and size. According to Cerebras, the WSE is 56.7 times larger than the largest GPU, has 3,000 times more on-die memory, has 10,000 times more memory bandwidth, and fits into 1/50th of the space of a traditional data center configuration with thousands of server nodes. The company has not discussed the availability of the platform or estimated cost." https://www.cerebras.net/ Way much more info in the site. Of course I'm not related in any capacity to Cerebras, Forbes or any other company mentioned in the article.
    2 replies | 83 view(s)
  • Alexander Rhatushnyak's Avatar
    11th September 2019, 21:11
    This table would be more valuable if it included compression time for at least some of the entries. I started testing v182 on LPCB images, and as far as I can see now, compression is going to take ~200 hours... And then there's the decompression half of it. To build a 32-bit Windows executable with MinGW 8.1, I had to remove "64i32" from the line with #define STAT _stat64i32
    1707 replies | 478691 view(s)
  • CompressMaster's Avatar
    11th September 2019, 19:26
    Slovak. btw, @encode, I sended you PM some times ago and it´s still unanswered...
    203 replies | 78573 view(s)
  • Darek's Avatar
    11th September 2019, 18:14
    Darek replied to a thread paq8px in Data Compression
    I've checked v122 and v124 versions - I've compressed it again - and: paq8px v122 - enwik8 -8 = 17'481'434 -> your score is ok, my score was identical to mpais but it could be score w/o header (48b??) paq8px v124 - enwik8 -8 = 17'461'692 -> same as my previous score - maybe there were two versions of v124 or different compiles (linux/Windows...) because there are "e" option. But if there be as big difference? Strange. paq8px v123 - score mpais is the same as my score
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    11th September 2019, 15:06
    Gotty replied to a thread paq8px in Data Compression
    My source for v122 and v124:
    1707 replies | 478691 view(s)
  • Darek's Avatar
    11th September 2019, 15:01
    Darek replied to a thread paq8px in Data Compression
    I've added some my older scores and the chart with enwik8 scores history. I have a little different scores for v122 and v124 versions - I'll check it.
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    11th September 2019, 14:24
    Gotty replied to a thread paq8px in Data Compression
    Darek, again your scores helped me understand something. I don't use pre-training during my tests in order to see how the models can handle file content on their own - without the initial boost. My results show nice gains for almost all text files due to the changes in WordModel. With pre-training "on" there is almost always a significant loss - and I now know why. I'm gonna fix that in the next release. Thanx!
    1707 replies | 478691 view(s)
  • Darek's Avatar
    11th September 2019, 09:27
    Darek replied to a thread paq8px in Data Compression
    Here are scores for my testset for paq8px v182 - about 0.18% of gain = 18KB less -> that means new record overall for my testset, best ever score for 0.WAV and L.PAK! Besides good improvements for bigger files there also some loses for smaller textual files. And to Gotty's table my scores for paq8px v179 for enwik8: 16'456'797 - enwik8 -s9eta -Paq8px_v179 16'079'926 - enwik8.drt -s9eta -Paq8px_v179 One calculation more - based on my full enwik scores for paq8px v172: 16'471'210 - enwik8 -s9eta -Paq8px_v172 16'081'924 - enwik8.drt -s9eta -Paq8px_v172 133'708'688 - enwik9_1423 -s9eta -Paq8px_v172 129'893'797 - enwik9_1423.drt -s9eta -Paq8px_v172 I've estimate scores of paq8 v182 for enwik9. It should be about: 133'3xx'xxx for non-preprocessed enwik9 and 129'6xx'xxx for preprocessed file -it's 6'th place on LTCB benchmark!
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    11th September 2019, 07:53
    Gotty replied to a thread paq8px in Data Compression
    A bit of enwik8 history to see how far we came from. paq8px_v49 -8 17'733'057 paq8px_v77 -8 17'629'076 paq8px_v122 -8 17'481'434 paq8px_v123 -8? 17'461'019 Paq8px_v124 -s8 17'301'948 Paq8px_v136 -s9 17'181'903 Paq8px_v136b -s9 17'013'211 Paq8px_v138 -s9 16'984'529 Paq8px_v141 -s9eta 16'872'591 paq8px_v144 -9ta 16'806'829 paq8px_v147 -9ta 16'791'924 Paq8px_v149 -9eta 16'742'288 Paq8px_v153 -9eta 16'736'395 Paq8px_v154a -9eta 16'737'358 Paq8px_v165 -s9eta 16'618'525 Paq8px_v166 -s9eta 16'619'686 Paq8px_v167 -s9eta 16'625'547 Paq8px_v170 -s9eta 16'518'653 Paq8px_v171 -s9eta 16'520'179 Paq8px_v172 -s9eta 16'471'210 Paq8px_v173 -s9eta 16'461'020 paq8px_v179 -s9eta 16'456'797 Paq8px_v181 -9ta 16'446'172 Paq8px_v182 -9ta 16'428'290
    1707 replies | 478691 view(s)
  • Gotty's Avatar
    10th September 2019, 23:12
    Gotty replied to a thread paq8px in Data Compression
    - Support for unicode path names - Refactored WordModel - WordModel now extracts text from pdf files and processes it as it would be a separate text file - Added: expressions, content between words (gap-content) - Removed: number modeling, space modeling (they are merged into word modeling and gap-content modeling) - Introduced context skipping in StateMap and ContextMap (used currently in WordModel only) - Removed DistanceModel (partially merged into WordModel) - Reinstated probabilistic increment in JpegModel - Updated help screen to reflect the current default memory use per compression level - Other cosmetic changes
    1707 replies | 478691 view(s)
  • dougg3's Avatar
    10th September 2019, 09:41
    Looks like somebody figured it out: https://github.com/jhol/otl-lkv373a-tools/issues/1 Apparently it's a variant of UCL.
    12 replies | 2300 view(s)
  • encode's Avatar
    10th September 2019, 08:54
    encode replied to a thread CHK Hash Tool in Data Compression
    :_coffee:
    203 replies | 78573 view(s)
  • encode's Avatar
    10th September 2019, 00:39
    encode replied to a thread CHK Hash Tool in Data Compression
    + Added "Clear All" Toolbar button + Added "Invert Selection" menu command + Now the CHK can recognize the symbolic links (.symlink type) Still fine-tuning the GUI...
    203 replies | 78573 view(s)
  • Darek's Avatar
    9th September 2019, 20:39
    >What is it the column "RC1 maxed K&L + Mauro opt"? That a combinaion of yours best option runs for 1.BMP, O.APR and R.DOC and both K.WAD and L.PAK used with maximum "hidden size 1024" option. Due to very slow compression on this option, my other tests are using "hidden size 640" option for these two files and then, after finding best other option I've test these best option with 1024 hidden size. F.e. Hidden size = 640 means 3 days of compression of K.WAD file, but Hidden size = 1024 means 5 days...
    92 replies | 9161 view(s)
  • Gotty's Avatar
    9th September 2019, 01:14
    Gotty replied to a thread Precomp 0.4.7 in Data Compression
    (Not so important but interesting) notes: Currently paq8px reserves memory for most (almost all) models at startup (without knowing the file(s) to be compressed). So it's true that the jpegmodel uses more memory at level -8 compared to level -4, but not all the 2.4 GB is reserved for the jpegmodel. When compressing jpeg images the normalmodel and matchmodel are also in effect. They help jpeg compression slightly and slows down compression significantly. They add 73+23 mixer inputs and 3656+8 mixer contexts respectively. The jpegmodel "only" adds 70 inputs and 2058 mixer contexts to the mixer. So the jpegmodel is a "minority" when compressing jpeg images.
    36 replies | 3234 view(s)
  • Shelwien's Avatar
    9th September 2019, 00:58
    Shelwien replied to a thread Precomp 0.4.7 in Data Compression
    As as said, for "good enough" I can only suggest tweaking srep options (like -l) and not resetting stats between images (maybe precomp already does that). Significant improvements over that are possible (for example, jpeg files frequently include "thumbnails" which are scaled-down copies of the same images), but they're incompatible with the most common jpeg recompression method, which is based on data transformation. jojpeg doesn't have that problem, since it always works with original jpeg data, but apparently it still has to be made 7x faster, which is hard.
    36 replies | 3234 view(s)
  • Gonzalo's Avatar
    9th September 2019, 00:11
    Gonzalo replied to a thread Precomp 0.4.7 in Data Compression
    Yep, I tried that and got similar results. About solid compression, maybe not a perfect method, just a good enough would be better than nothing, if anyone has the skillset. I personally prefer practical solutions with a nice balance between speed and ratio that can be used in the real world rather than an extreme solution. Of course, every niche has its use cases. Sometimes just for the sake of science and theoretical limits and that's completely fine too.
    36 replies | 3234 view(s)
  • schnaader's Avatar
    8th September 2019, 22:48
    schnaader replied to a thread Precomp 0.4.7 in Data Compression
    Done, edited the table in my previous post, adding paq8px_v181fix1: jojpeg_sh2 15.3 s 15.0 s 637,228 93.6% 33.7x paq8px_v181fix1 -4 70.8 s 69.3 s 634,652 93.2% 155.7x paq8px_v181fix1 -8 67.8 s 67.4 s 628,672 92.3% 150.2x Indeed another 1% improvement. The second result (-8, using 2.4 GB mem) surprised me, didn't know that the jpg model can make good use of more memory. I guess the faster speed of -8 seems to indicate that my timings are to be taken with a grain of salt.
    36 replies | 3234 view(s)
  • Mauro Vezzosi's Avatar
    8th September 2019, 22:38
    > Scores of my testset for timesteps settings on 16, 18, 20, 22, 24. What is it the column "RC1 maxed K&L + Mauro opt"? Are the options written in "BEST OPTIONS"? Have I suggested some options? --> Edit: I found them: https://encode.su/threads/3094-NNCP-Lossless-Data-Compression-with-Neural-Networks?p=61290&viewfull=1#post61290
    92 replies | 9161 view(s)
  • Mauro Vezzosi's Avatar
    8th September 2019, 22:33
    I don't know what to suggest because I don't know how layer normalization affects other options. If I had to test some options again with layer normalization enabled, I would test layers and adam_alpha_lr (and cells and horizon?), only one or two smaller and larger values. Maybe someone else (Byron?) has some suggestions. I changed adam_alpha_lr from 0.0033 to 0.0030 because in my few tests it seemed to be a little better, but I'm not sure. Also the new value of init_range = 0.150 has not been tested well.
    74 replies | 8207 view(s)
  • Darek's Avatar
    8th September 2019, 20:47
    Scores of my testset for timesteps settings on 16, 18, 20, 22, 24. For some files (like K.WAD) is hard to determine direction - scores goes up and down. However, some optimizations was made.
    92 replies | 9161 view(s)
  • Shelwien's Avatar
    8th September 2019, 20:33
    Shelwien replied to a thread Precomp 0.4.7 in Data Compression
    I usually deal with that by using a dedup filter with large enough minmatchlen. Jojpeg has some benefits in that sense since it outputs jpeg metainfo to another stream which can be compressed with solid LZ. Also jojpeg processes the data as is, by directly computing bit probabilities rather than with a transformation, so its easy to combine with matchmodel or something (like its used in paq). @mpais also added previous image contexts in paq8px, but I think that it'd slow down jojpeg too much. Otherwise its pretty hard to provide a perfect method of solid jpeg recompression. Atm we don't even really have one for plain bitmaps - its possible to benefit from not resetting stats between images, but its hard to find an image codec with block matching (there're some based on video codecs, but they're not open-source). I have a plan to make a steganography-based jpeg codec, which would produce bmp+diff output, like reflate. It might be a good enough solution for this problem, if we can combine it with a good bmp coder, I guess. PS. As to simple removal of entropy coding, you can test it with http://nishi.dreamhosters.com/u/uncmpjpg_sh_012.rar But in most cases it won't be better than srep+packjpg or something.
    36 replies | 3234 view(s)
More Activity