Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Shelwien's Avatar
    Today, 16:48
    This is not about using some already available tool, but more about development of one. zpaq doesn't fit at all in that sense, because all of its algorithms are worse than other known open-sources ones. Yes, its nice that zpaq is a working integrated solution, and I really appreciate that you're trying to improve it. But this thread is about designing a compression algorithm with given constraints. These constraints are a little ahead of current state-of-art, and there're multiple ways to solve it (making a stronger/slower LZ77 or ROLZ, speed-optimizing a fast-CM, finding a fitting BWT/postcoder setup, some LZ/CM hybrid maybe, etc) so I'd like to know what other developers think about this.
    5 replies | 166 view(s)
  • Gotty's Avatar
    Today, 15:06
    Gotty replied to a thread paq8px in Data Compression
    Yes, as Luca said and also Moises said and also I said before: don't post source code here. It's quite cumbersome to find changes in your contributions that way. Do a pull request to my branch instead if you would like to have it reviewed. Would yo do that please? I'm not an authority here but I must act one as I haven't seen a modification/improvement from your side that was bug-free or issue-free. Maybe your current one is an exception, I don't know yet. But I'm willing to review it only when you use git. No source codes in the forum please. That's not reviewer friendly.
    2300 replies | 607585 view(s)
  • LucaBiondi's Avatar
    Today, 14:39
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi suryakandau Could you post your modifications for example an exam diff please I would to learn (slowly) how to paq8px works... thank you Luca
    2300 replies | 607585 view(s)
  • fcorbelli's Avatar
    Today, 12:41
    When making backup of virtual disks, even worse thick one, it is normal to take say 400GB for image. If you have even only 2/3 it's one terabyte just for a home server. in production easily 10TB for DAY. there is not a single x86 CPU that can compress this amount of data with High compress ratio. bandwith dominate the problem, it is io bound and not CPU bound. If the backup is 370GB or 390GB makes no difference at all. If 3000GB or 3600GB even less. For quick backup the answer is differential zfs send (not incremental) pigzip-ed Requires zfs, lots of Ram and fast disks. It is doable: I do every day from years. But restore is painful, and extensive zfs expertise needed. I make intermedial backup with zfs (hourly) and nighttime zpaqfranz, plus ZIP (yes 7z in ZIP mode)
    5 replies | 166 view(s)
  • fcorbelli's Avatar
    Today, 12:30
    Zpaq is the only answer. I use it for the same work by years, until today. Compression speed is decent (150/200MB/s for modern server). Deduplication very good. High compression is totally a waste of time for virtual disks. m1 or even m0 (only dedup). I will prefer pigz -1 but it is too hard to merge in zpaq. It simply... WORKS even with very big files. Decompression is slow with magnetic disks but.. who cares? Better, of course, my zpaqfranz fork. Compile on BSD and Linux and Windows With a decent gui for Windows (pakka) a la time macchine.
    5 replies | 166 view(s)
  • kaitz's Avatar
    Today, 04:11
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Silesia:
    1032 replies | 363202 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 03:06
    i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs. @darek could you test it on silesia file using -12lrta option ? i think it can save more space. thank you!!
    2300 replies | 607585 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 23:54
    Funnily, this was a starting point for brunsli. In the first six weeks of Brunsli development (back in 2014) we migrated away from the clustering, because context modeling (and even prediction) were stronger for our JPEG recompression corpus. It doesn't mean that it could be more effective now for another domain. Just back then me and Zoltan were not able to get fantastic results with this approach on JPEG recompression modeling. We got about -14 % of JPEG with this kind of ideas (k means in five dimensions to do entropy coding) and -22 % with the ideas expressed in brunsli or JPEG XL. In WebP lossless we sometimes have 700+ entropy codes for different ergodic processes -- so having 3 to 255 is not necessarily excessive.
    5 replies | 515 view(s)
  • danlock's Avatar
    Yesterday, 22:13
    New approaches to compression and image processing are exciting and invigorating! I hope they stimulate others who have image compression specialization to consider new perspectives as well! Watching this method evolve will be a lot of fun! Thanks for introducing us to your Rust image compression library!
    5 replies | 515 view(s)
  • Shelwien's Avatar
    Yesterday, 20:08
    Shelwien replied to a thread paq8px in Data Compression
    fastest method is probably https://www.facebook.com/mattmahoneyfl
    2300 replies | 607585 view(s)
  • CompressMaster's Avatar
    Yesterday, 20:00
    CompressMaster replied to a thread paq8px in Data Compression
    @Darek, I highly doubt that this is the proper way for reaching Mr. Matt Mahoney. Better to navigate to his website mattmahoney.net and contact him from there.
    2300 replies | 607585 view(s)
  • Shelwien's Avatar
    Yesterday, 19:55
    100,000,000 mixed100.cut 27,672,053 m1.zpaq 24,873,818 m2.zpaq 19,040,766 m3.zpaq 17,735,276 mixed100.cut.zst 15,180,571 mixed100.cut.lzma 13,465,031 m5.zpaq No, zpaq is totally useless in this case since its LZ and BWT are subpar and CM is too slow. In any case, even -m1 -t1 encoding is already slower than 50MB/s, and -m2 is more like 5MB/s. -m5 compression is good, but 0.5MB/s... there're much faster CMs around.
    5 replies | 166 view(s)
  • ivan2k2's Avatar
    Yesterday, 18:50
    ZPAQ maybe? It have fast modes like -m1/2/3, or you can play with custom compression mode(-m x.......), it requires some time to find a good one. Just check ZPAQ thread
    5 replies | 166 view(s)
  • kaitz's Avatar
    Yesterday, 18:39
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v99 -xml/html like content processed separately in wordmodel -adjust some wordmodel parameters -attempt to detect ISO latin text -some changes in detection (cleanup) 1 thing helps on silesia webster/xml 2 helps 1bit on most text files (1k loss on enwik8) 3 helps on silesia samba maybe mozilla and any other text file prev detected as bintext 4 in tar mode some files are treated as text by extension without confirming (.c,.h,.html,.cpp,.po,.txt), just a bit faster processing. (for examp, linux kernel source tar) ​
    1032 replies | 363202 view(s)
  • Shelwien's Avatar
    Yesterday, 17:03
    - VM image data (mostly exes and other binaries) - compression ratio 15% higher than zstd (lzma is ~10% better, but its encoding is too slow, or no gain without parsing optimization) - encoding speed = 50MB/s or higher (single thread) - decoding speed = 20MB/s or higher (single thread) Any ideas? Preprocessors can be used, but are also applicable for zstd. CM actually might fit by CR and enc speed (with context sorting etc), but decoding speed is a problem. PPMs doesn't fit because its CR is bad on this data.
    5 replies | 166 view(s)
  • Gotty's Avatar
    Yesterday, 15:49
    Gotty replied to a thread paq8px in Data Compression
    The change I made in normalmodel ("NormalModel now includes the BlockType in its contexts") in v199 is not fully compatible with text pre-training. Calgary/trans includes mostly text but is detected as "default", so it lost the boost given by text pre-training. A fix is coming in my next version.
    2300 replies | 607585 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 14:09
    @darek which option do you use ?
    2300 replies | 607585 view(s)
  • Darek's Avatar
    Yesterday, 13:17
    Darek replied to a thread paq8px in Data Compression
    Scores of 4 Corpuses for paq8px v201. For Calgary corpus there no record, however most of files got the best results. The strange thing is that "trans" file got 5.6% lose between version v198 and 199. For Canterbury corpus there a record of total files compression, no record for tar file due to mentioned change between v192 and v193 For MaxumimCompression corpus - best score for paq8px serie, about 5KB of gain and 5.5KB of gain for tar file For Silesia corpus - there is a record score (15.5KB of gain) and THE BEST SCORE overall - that means paq8px beat best cmix score (whith precompressor) !!!! Request for Matt Mahoney: could you add this score to the official Silesia page? Detailed scores are as follow: compressor version: paq8px_v201, no precompression FILE score truncated score dickens | 1,796,777 | 1796 mozilla | 6,569,742 | 6569 mr | 1,839,331 | 1839 nci | 815,620 | 815 ooffice | 1,183,903 | 1183 osdb | 1,978,882 | 1978 reymont | 698 148 | 698 samba | 1,625,828 | 1625 sao | 3,733,172 | 3733 webster | 4,413,395 | 4413 x-ray | 3,521,326 | 3521 xml | 250,710 | 250 TOTAL | 28,426,834 | 28426
    2300 replies | 607585 view(s)
  • skal's Avatar
    Yesterday, 09:44
    Glad you noticed the introduction of palette + predictors :) Yes, since right now it's doing a pretty exhaustive search for predictors and transforms to apply. Going forward, some heuristic will help. We're trying to work with premultiplied Argb as much as possible. Means the RGB can't be really preserved under alpha=0 (it's rather a niche use-case, despite regular complaints about it, even for v1 where we introduced the -exact flag). ​
    66 replies | 4664 view(s)
  • skal's Avatar
    Yesterday, 09:25
    Looking at the slide #10 here: https://www.itu.int/en/ITU-T/Workshops-and-Seminars/20191008/Documents/Guido_Meardi_Presentation.pdf it seems the base layer is now coded at half the resolution (instead of full). The extra enhancement layer is then coded more efficiently. Hence the global gain.
    1 replies | 128 view(s)
  • spaceship9876's Avatar
    Yesterday, 05:40
    The future AV2 codec might be released in 2026 which is when nearly all h264 patents will have expired. Sisvel claim that hundreds of patents in their patent pool are violated by AV1. The organisation 'Unified Patents' is working on invalidating many h265 and av1 patents in the courts at the moment. It is difficult to know whether EVC will become popular, i highly doubt it. This is because AV1 is already available and hardware decoders are already available. I think the creation of EVC is to try to pressure the companies involved in the creation of VVC/H.266 to choose low patents fees and fair conditions such as no streaming fees. This is what happened with Microsoft's VC-1 codec, it forced the mpeg-la to make h264 cheap to license with good conditions.
    1 replies | 128 view(s)
  • SolidComp's Avatar
    Yesterday, 03:57
    Hi all – Another next-generation video codec is Essential Video Coding (EVC). This is also an example of a dual layer approach, but it doesn't use an existing codec as the base layer like LCEVC does. Rather, it uses expired patents as the base layer, which is freaking brilliant! What I mean is that the base codec is intended to be free and unencumbered by patents or licensing. But it's a new codec, one that only uses innovations or techniques that were patented at least 20 years ago, and therefore expired. (There are also some patent grants or something thrown in, but it's mostly about expired patents.) The base codec is supposed to be something like 30% more efficient than H.264, so it's probably only slightly worse than H.265/HEVC, and might be similar to VP9 but with less complexity, more speed. The enhanced version might cost money to license, and is supposed to be about 24% more efficient than H.265/HEVC. Here's an article that talks about it. Have any other projects used the expired patents method before as a formal constraint in developing a codec? Is that how free codecs like Opus would be developed? It's fascinating, and of course more possibilities steadily emerge as each day passes and more patents expire. It seems like there might be an opportunity to apply advanced software to leveraging patents this way, but it would probably require something closer to real AI than what Silicon Valley currently labels as "AI".
    1 replies | 128 view(s)
  • SolidComp's Avatar
    Yesterday, 03:43
    Hi all – Can anyone explain how LCEVC works? It's supposed to leverage an existing codec (e.g. H.264 or VP9) and adds an "enhancement layer". The end result is something like a 28% data savings. I've read a few things about it, but I don't understand where the trick is, the secret sauce. How can adding a layer to an existing encode save data? Here's a recent test comparing H.264 to LCEVC using H.264 as the base layer. Thanks.
    1 replies | 128 view(s)
  • e8c's Avatar
    20th January 2021, 20:48
    e8c replied to a thread Lucky again in The Off-Topic Lounge
    Ну, например, я не припоминаю никаких "плохих" поступков, сделанных Михаилом. (Только не надо говорить про итоги приватизации: с моей точки зрения, большинство людей, получивших в собственность жильё, построенное во времена СССР, имеют на него не больше прав, чем Миша, купивший за копейки (относительно реальной стоимости) несколько предприятий на "залоговых аукционах".) ​ Во-первых, он не олигарх. Олигархи - это Ротенберг(-и), Усманов, Дерипаска, Абрамович. Не олигарх - это, например, Мордашов: никаких муток с "кремлёвскими" не мутит, башляет на общак, когда (и сколько) скажут, а так "его хата с краю". ​Ходорковский и Мордашов похожи, разница в том, что Мордашову комфортно быть "с##кой" "кремлёвских" (поэтому он не потерял собственность), а Мише такая перспектива не очень нравилась. Он давно для себя выбрал роль "парня, топящего за всё хорошее (ну, в основном)" (https://youtu.be/alu-03QTJvs), потом у него появились политические амбиции, чувак решил "конкурировать" с экс-КГБ, не имея в распоряжении превосходящей по силе личной армии. Во-вторых, для чего добавлено прилагательное "Jewish"? ​ Если посмотреть на сообщения пользователя @lz77, в которых как-то упомянуты деньги, то прилагательное "Jewish" выглядит как проявление зависти к тем, кто более удачлив в деле набивания собственных карманов. И точно не выглядит как критика людей, ставящих прибыль превыше всего. ​Вообще, Ходорковский - политический Ноль, и занимается тем, что тратит деньги на третьесортные СМИ ("Открытые Медиа" не сделали практически ни одного громкого расследования). В данный момент Ходорковский - фактически Никто.
    11 replies | 673 view(s)
  • Gotty's Avatar
    20th January 2021, 18:39
    Gotty replied to a thread paq8px in Data Compression
    Paq8px has a bunch of "general" models for "general data". And specialized models for each of the special data: audio, jpg and bitmap images. If paq8px don't see anything "special" about the data (also applies to the case when it can't detect the file type (blockType) because of some data corruption), it will just use its default models to compress it. Additional note: there are a couple of special blocktypes that paq8px can transform before compression: like zip, cdrom, exe. If it fails to detect these formats then no transformation takes place and so without these special compression-helping transformations compression ratio will be somewhat worse.
    2300 replies | 607585 view(s)
  • CompressMaster's Avatar
    20th January 2021, 17:44
    CompressMaster replied to a thread paq8px in Data Compression
    How is compression of broken files handled in paq8px? It refuses to compress at all or at least tries it? By "broken", I mean not text files with missing characters, I mean for example JPG without header, PDF corrupted by virus etc.
    2300 replies | 607585 view(s)
  • fcorbelli's Avatar
    20th January 2021, 17:36
    fcorbelli replied to a thread zpaq updates in Data Compression
    Just to keep https://github.com/fcorbelli/zpaqfranz
    2655 replies | 1131493 view(s)
  • lz77's Avatar
    20th January 2021, 17:21
    lz77 replied to a thread Lucky again in The Off-Topic Lounge
    Try asking Navalny to say something bad about Khodorkovsky or just say the phrase "Jewish oligarch." I guess that it will be very difficult for him to do this. :) Although, with his revelations, he benefits people. It's not clear who wets whom: Putin Navalny or Navalny Putin. :)
    11 replies | 673 view(s)
  • Sportman's Avatar
    20th January 2021, 13:16
    Sportman replied to a thread Lucky again in The Off-Topic Lounge
    Flat upgrade: https://www.youtube.com/watch?v=ipAnwilMncI
    11 replies | 673 view(s)
  • umgefahren's Avatar
    20th January 2021, 13:15
    I have my results! The hole folder with all the compressed images is 305,598,693 Bytes in size. It took 405.2812011241913 Seconds to compress them. It took 9.863522052764893 Seconds to decompress them. I used this image set: RGB 8 bit My compression ratio is 1.539966344 on the hole image set. The compression ratio of the individual images are: | File Name | Compression Ratio | |------------------------|--------------------| | spider_web.ppm | 2.14235671557331 | | deer.ppm | 1.2424318516015507 | | fireworks.ppm | 3.642381743674327 | | artificial.ppm | 12.25476523000428 | | bridge.ppm | 1.2273064711294759 | | flower_foveon.ppm | 2.4469685311217293 | | big_tree.ppm | 1.2789847127858722 | | cathedral.ppm | 1.5089509013690656 | | hdr.ppm | 1.9960575653205344 | | leaves_iso_1600.ppm | 1.203903570936856 | | big_building.ppm | 1.3922857035699863 | | nightshot_iso_1600.ppm | 1.501047996887146 | | nightshot_iso_100.ppm | 2.251600481220427 | | leaves_iso_200.ppm | 1.3158267828823695 | The table function seems broken, i hope this is fine.
    5 replies | 515 view(s)
  • cssignet's Avatar
    20th January 2021, 11:09
    - do you think there is room for improvement in encoding speed (if y, by how much distance from v1 efficiency?) - would you plan to let straight alpha and preserve RGB if a=0? (or is it possible atm? — i did not check)
    66 replies | 4664 view(s)
  • cssignet's Avatar
    20th January 2021, 11:08
    ​ 571c204: >cwp2 -q 100 -effort 1 -mt 8 box.png output size: 70990 (2.17 bpp) Kernel Time = 0.000 = 00:00:00.000 = 0% User Time = 0.265 = 00:00:00.265 = 282% Process Time = 0.265 = 00:00:00.265 = 282% Global Time = 0.094 = 00:00:00.094 = 100% >cwp2 -q 100 -mt 8 box.png output size: 60658 (1.85 bpp) Kernel Time = 0.015 = 00:00:00.015 = 4% User Time = 1.031 = 00:00:01.031 = 314% Process Time = 1.046 = 00:00:01.046 = 319% Global Time = 0.328 = 00:00:00.328 = 100% 3db306e: >cwp2 -q 100 -effort 1 -mt 8 box.png output size: 55907 (1.71 bpp) Kernel Time = 0.015 = 00:00:00.015 = 10% User Time = 0.437 = 00:00:00.437 = 280% Process Time = 0.453 = 00:00:00.453 = 290% Global Time = 0.156 = 00:00:00.156 = 100% >cwp2 -q 100 -mt 8 box.png output size: 50733 (1.55 bpp) Kernel Time = 0.046 = 00:00:00.046 = 8% User Time = 1.828 = 00:00:01.828 = 334% Process Time = 1.875 = 00:00:01.875 = 342% Global Time = 0.547 = 00:00:00.547 = 100% :cool:
    66 replies | 4664 view(s)
  • umgefahren's Avatar
    20th January 2021, 09:45
    I would love to. Give me some time.
    5 replies | 515 view(s)
  • Kirr's Avatar
    20th January 2021, 09:05
    Thanks Innar, and please take good rest and get well! One straightforward path for compression improvement to affect biology knowledge is by improving accuracy of alignment-free sequence comparison. Better compression means better approximation of Kolmogorov complexity and more accurate information distances. This can be used for many purposes, e.g., for comparing entire genomes, or investigating the history of repeat expansions. I'm not sure if improved compression can help in studying coronaviruses specifically, because coronaviruses can be easily aligned, which allows more accurate comparison without compression. But many other topics can greatly benefit from better compression. E.g. see for some overview. One other thing. I think there's too much emphasis on compression strength in this field. This is understandable, because in information science we dream about computing Kolmogorov complexity, so any step closer to approximating it must be welcome. However, compressor users often have a different balance of priorities, where compression strength is just one of useful qualities. (This again explains longevity of gzip in genomics). I realized that many compressor developers mainly care about compression strength. They will spend huge effort fine-tuning their method to gain extra 0.01% of compactness. But they are fine if their compressor works only on DNA sequence (no support for sequence names, N, IUPAC, or even end of line in some cases). Or if their compressor takes days (or weeks) to compress a genome (more problematic, but still common, is when it takes days for decompression too). Maybe it feels great to get that 0.01% compactness, but it's often disconnected from applications. What users want in a compressor is a combination of reliability, strength, speed, compatibility and ease of use. Funny thing is that I did not want to develop a compressor. But I wanted to use one, because I was routinely transferring huge data back and forth among computation nodes. I was shocked to realize that in a ton of available DNA compressors there's not one that was suitable for my purposes. (Never mind another ton of papers describing compression method without providing any actual compressor). Currently, personally NAF is perfect for my needs. But if I ask myself, how it can be made even better, the answer (for me as a user) is not just "20% better compactness" (even though it would be great too). Instead it may be something like: 1. Random access (without sacrificing compression strength much). 2. Library for easier integration in other tools. 3. Built-in simple substring searcher. 4. Something about FASTQ qualities. (:)). etc.. E.g., is an interesting recent development for practical uses. Zielezinski et al. (2019) "Benchmarking of alignment-free sequence comparison methods" Genome Biology, 20:144, https://doi.org/10.1186/s13059-019-1755-7 Hoogstrate et al. (2020) "FASTAFS: file system virtualisation of random access compressed FASTA files" https://www.biorxiv.org/content/10.1101/2020.11.11.377689v1.full
    38 replies | 1971 view(s)
  • Shelwien's Avatar
    20th January 2021, 02:29
    Welcome! Can you post some test results vs other lossless codecs? http://imagecompression.info/gralic/LPCB.html
    5 replies | 515 view(s)
  • innar's Avatar
    20th January 2021, 00:46
    Dear Kirr, Thank you so much for such deep insights. I had an unexpected health issue, which took me away from computer for few weeks, but I will bounce back soon, hopefully in the end of this week and work through the whole backlog of submissions incl yours. Your contribution is highly appreciated! Meanwhile, if someone checks this forum, then I would relay a question, which I got from one of the top 50 researchers in genetics: if suddenly someone would get a (let's say) 20% (30%? 50%?) better result than others - how could this be turned to an insight for professionals with deep knowledge about coronaviruses? What would be the way, representation or visualization of results (or tools) that would enable a person knowing nothing about compressing algorithms, but a lot about coronaviruses, to understand how such compression came about? I think this is an important and fundamental question from many benchmarks- how to leak the "intelligence of better compression" back to the field? Any ideas?
    38 replies | 1971 view(s)
  • Mike's Avatar
    19th January 2021, 22:14
    Mike replied to a thread 7zip update in Data Compression
    7-Zip 21.00 alpha was released: https://sourceforge.net/p/sevenzip/discussion/45797/thread/dd4edab390/
    2 replies | 830 view(s)
  • fcorbelli's Avatar
    19th January 2021, 21:18
    fcorbelli replied to a thread zpaq updates in Data Compression
    This is version 50.7, with numerous bug fixes. In particular (perhaps) settled the test-after-add. Using the -test switch, immediately after the creation of the archive, a "chunk" SHA1 codes verify is done (very little RAM used), with also a CRC-32 verification (HW if available) This intercept even SHA1 collisions C:\zpaqfranz>zpaqfranz a z:\1.zpaq c:\dropbox\Dropbox -test zpaqfranz v50.7-experimental journaling archiver, compiled Jan 19 2021 Creating z:/1.zpaq at offset 0 + 0 Adding 8.725.128.041 in 29.399 files at 2021-01-19 18:11:23 98.22% 0:00:00 8.569.443.981 -> 5.883.235.525 of 8.725.128.041 164.796.999/sec 34.596 +added, 0 -removed. 0.000000 + (8725.128041 -> 7400.890377 -> 6054.485111) = 6.054.485.111 Forced XLS has included 13.342.045 bytes in 116 files zpaqfranz: doing a full (with file verify) test Compare archive content of:z:/1.zpaq: 1 versions, 34.596 files, 122.232 fragments, 6.054.485.111 bytes (5.64 GB) 34.596 in <<c:/dropbox/Dropbox>> Total files found 34.596 GURU SHA1 COLLISION! B3FBAB1C vs 348150FB c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf # 2020-11-06 16:00:09 422.435 c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf + 2020-11-06 16:00:09 422.435 c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf Block checking ( 119.742.900) done ( 7.92 GB) of ( 8.12 GB) 00034595 =same 00000001 #different 00000001 +external (file missing in ZPAQ) Total different file size 844.870 79.547 seconds (with errors) This (quick check) function can be invoked simply by using l instead of a zpaqfranz a z:\1.zpaq c:\pippo zpaqfranz l z:\1.zpaq c:\pippo Win32 e Win64 on http://www.francocorbelli.it/zpaqfranz.exe http://www.francocorbelli.it/zpaqfranz32.exe Any comments are very welcome
    2655 replies | 1131493 view(s)
  • e8c's Avatar
    19th January 2021, 16:54
    e8c replied to a thread Lucky again in The Off-Topic Lounge
    https://tjournal.ru/internet/325612-instagram-blogerov-zapodozrili-v-kritike-vozvrashcheniya-navalnogo-po-metodichke-tezisy-u-nekotoryh-i-pravda-sovpadayut ​ ​One of the best clownery of Zhirinovsky: https://tjournal.ru/news/325627-on-odin-s-nami-boretsya-kto-eto-takoy-v-gosdume-zagovorili-o-navalnom-zhirinovskiy-posvyatil-emu-celuyu-rech
    11 replies | 673 view(s)
  • Lithium Flower's Avatar
    19th January 2021, 14:43
    @ Jyrki Alakuijala A little curious, Probably have a plan or patch to improvement non-photographic fidelity(quality), in next jpeg xl public release(jpeg xl 0.3)? I’m looking forward to use .jxl replace .jpg, .png, .webp file. :)
    39 replies | 3365 view(s)
  • Lithium Flower's Avatar
    19th January 2021, 14:41
    @ Jyrki Alakuijala About butteraugli and tiny ringing artefacts, previous post sample image eyes_have tiny artifacts2, in this sample image, have tiny ringing artefacts on character eyes, that tiny artefacts isn't easy to see, but if i compare modular mode(-m -Q 90 speed tortoise) file and vardct mode file, that tiny artefacts will a little uncomfortable with visually experience, modular mode file don't produce tiny artefacts, but probably have another tiny error, vardct mode file compress very well(file size), but will produce tiny artefacts in some area, in sample image eyes_have tiny artifacts2, need use jpeg xl 0.2 -d 0.5(Speed: kitten), to avoid tiny ringing artefacts issue. I guess that tiny ringing artefacts in photographic type image, probably very hard to see, so butteraugli will assessment that image is fine or have tiny error, but in non-photographic type image, if some image content area have tiny ringing artefacts, will very easy to see and a little uncomfortable with visually experience. like chroma subsampling, photographic image and non-photographic image have different situation, some photographic type image use chroma subsampling still can get good visually experience, but in non-photographic type image, chroma subsampling always a bad idea.
    39 replies | 3365 view(s)
  • Lithium Flower's Avatar
    19th January 2021, 14:37
    @ Jyrki Alakuijala Thank you for your reply It look like jpeg xl 0.2 -d 1.0 (speed kitten) still in little risk zone, in my test some image in -d 1.0, 0.9 will get maxButteraugli 2.0+ , -d 0.8 can limit maxButteraugli below 1.6(1.55), In previous post maxButteraugli below 1.3 ~ 1.4, can stay safe zone, could you teach me about target distance -d 0.9(q91), 0.8(q92), 0.7(q93), 0.6(q94), those distance probably like 1.0 and 0.5 have a special meaning?
    39 replies | 3365 view(s)
  • umgefahren's Avatar
    19th January 2021, 11:36
    EDIT: The explanation version here is a bit outdated. For the new and more complete version of the explanation, please consider visiting the GitHub. It's quite hard to keep up with the changes. EDIT: Due to the latest switch from deflate in favor of Zlib, the algorithm manages to SURPASS PNG in some cases. Hello, I'm a newbie to compression and this forum, so please be patient. I recently wrote an image compression algorithm and put a prototype on GitHub. I've written a long explanation there, but in order to make it more comfortable here's the explanation: Image Compression Algorithm A new image compression algorithm. In this release version, the algorithm performs worse than PNG in most cases. In fact, the only image, where the algorithm outperforms PNG is the white void of img_3.png. However, the algorithm produces just slightly larger files then PNG. For example, img_2.png is about 12.8 MB, the resulting binary is 12.9 MB. How the system works Clustering The first step in the system is clustering the pixels. This happens in 5 Dimensions, with R, G, B, x and y of every Pixel. X & Y are normed over 255 in order to have a balance between the color values and the pixel position. This might offer a possible improvement. In the current settings a Kmeans is used to define 3 dominant clusters. More clusters are possible, but the calculation time increases rapidly with an increasing number of clusters. The encoding supports up to 255 clusters, but this is probably overkill. After defining the clusters, we calculate a cluster map, that removes the color values and just displays belonging to a cluster. A visualization of this would look like this: Grid In the next step we lay a grid on top of the cluster map. The chunks of the grids are not fixed size. They vary in size near the edges. For every grid, we check if all pixels in a grid belong to the same cluster. If this is given, the pixel is calculated relative, otherwise absolute. The gird contains for every chunk a value that determines the cluster or that the chunk has to be calculated absolute. Here is an illustration of this grid map. Every white pixel, symbolizes an absolute chunk. Calculating Lists In this step we, finally calculate the pixel values that are later written into the file. Every chunk is calculated according to the grid’s perception of absolute or relative value. Every chucks pixel values are added to a super list of relative or absolute pixel values. The pixel values are calculated in wiggly lines. Every cluster has a minimum pixel value. This value is according to the minimum R, G, B value in that chunk. The resulting pixel value is an addition of this chunk value and the encoded pixel value. Flatten and Byte conversion The grid, the cluster colors, the lines are converted in Vectors of u8 and then converted into bytes. Zstd Grid and lines bytes representations are compressed with the zstd algorithm. This should achieve the compression and provides an opportunity to optimization. Write File The resulting binary is just a list of the relevant compressed objects. Advantages compared to PNG Because of the grid, it's possible to load just specific chunks without loading the entire image. With further improvements it might be possible to surpass PNG in compression rate, but I can't prove that. Disadvantages compared to PNG Because of the clusterisation it takes quite long to calculate a result. It might be possible to improve that, although this would probably require to abolish Kmeans for another clustering algorithm. One solution to that could be a neuronal net. Help and Ideas are much appreciated, especially contributions on GitHub Thanks for your time! :)
    5 replies | 515 view(s)
  • Kirr's Avatar
    19th January 2021, 05:52
    Regarding the use of 2bit data. First of all, the fa2twobit itself is lossy. I found that it did not preserve IUPAC codes, line lengths and even sequence names (It truncates them). Also, using the 2bit data still requires decompression (other than with BLAT), while FASTA is a universal sequence exchange format. So I would rather remove 2bit representation from the contest. Anyone interested can trivially convert that DNA into 2-bit by themselves if they need it, potentially avoiding fa2twibit's limitations. But this raises a bigger question. Many (most actually) of the existing sequence compressors have compatibility issues. Some compress just plain DNA (no headers), some don't support N, IUPAC codes, mask (upper/lower case), etc.. Some have their own idea about what sequence names should look like (e.g. max length). Some compress only FASTQ, and not FASTA. How can these various tools be compared, when each is doing its own thing? When designing my benchmark last year, I decided to try my best to adopt each of the broken/incomplete tools to still perform a useful task. So I made a wrapper for each tool, which takes the input data (a huge FASTA file), and transforms it into a format acceptable to each tool. E.g., if some compressor tool does not know about N, my wrapper will pick out all N from the input, store it separately (compressed), and present the N-less sequence to the tool. Then, another wrapper will work in reverse during decompression, re-constructing the exact original FASTA file. The wrapped compressors, therefore, all perform the same task and can be compared on it. All my wrappers and tools used by them are available . This should make it relatively easy to adapt any existing compressor to work on FASTA files. In a similar way, a general-purpose compressor can be wrapped using those tools, to allow stronger compression of FASTA files. It could be an interesting experiment to try wrapping various general-purpose compressors and adding them to the benchmark, along with the non-wrapped ones. http://kirr.dyndns.org/sequence-compression-benchmark/ http://kirr.dyndns.org/sequence-compression-benchmark/?page=Wrappers
    38 replies | 1971 view(s)
  • Kirr's Avatar
    19th January 2021, 05:19
    I basically agree that there's no point doing elaborate work compressing the raw data that will be analyzed anyway. When dealing with FASTQ files the usual tasks are: 1. Get them off sequencer to analysis machine. 2. Transfer to computation nodes. 3. Archive. 4. Send to collaborators. The compressor for these purposes has to be quick, reasonably strong, and reliable (robust, portable). Robustness is perhaps the most important quality, which is not at all apparent from benchmarks. (This could be why gzip is still widely used). Among the dozens of compressors and methods, few seem to be designed for practical (industrial) use. Namely DSRC, Alapy, gtz, and NAF. DSRC unfortunately seems unmaintained (bugs are not being fixed). Alapy and gtz are closed source and non-free (also gtz phones home). So I currently use NAF for managing FASTQ data (no surprise). NAF's "-1" works well for one-time transfer (where you just need to get the data from machine A to machine B as quickly as possible). And "-22" works for archiving and distributing FASTQ data. One recent nice development in the field is transitioning to reduced resolution of base qualities. In the usual FASTQ data, the sequence is easy to compress, but the qualities occupy the main bulk of space in the archive. Therefore some compressors have option of rounding the qualities to reduce resolution. Now recent instruments can produces binned qualities from beginning, making compression much easier. CRAM and other reference-based methods work nicely in cases where they are applicable. However, there are fields like metagenomics (or compressing the reference genome itself) where we don't have a reference. In such case we still need a general reference-free compression. The interesting thing is that these days data volumes are so large that specialized tool optimized for specific data or workflow can make a meaningful difference. And yet most sequence databases still use gzip.
    38 replies | 1971 view(s)
  • Kirr's Avatar
    19th January 2021, 04:36
    Hi Innar and all! I've just tried NAF on the FASTA file. Among different options, "-15" worked best on this data, producing a 1,590,505 bytes archive. NAF is a compressor for archiving FASTA/Q files. It basically divides the input into headers, mask and sequence (and quality for FASTQ), and compresses each stream with zstd. This allows for a good compactness and very fast decompression. I use NAF to store and work with terabytes of sequence data. (BTW, NAF supports IUPAC codes). Many other sequence compressors exist, some of them are compared in , and might be interesting to try on this data. That benchmark includes a 1.2 GB Influenza dataset, which should produce similar results to the Coronavirus one. Also note the "Compressors" page has useful notes about various compressors. https://github.com/KirillKryukov/naf http://kirr.dyndns.org/sequence-compression-benchmark/ http://kirr.dyndns.org/sequence-compression-benchmark/?page=Compressors Cheers!
    38 replies | 1971 view(s)
  • Jyrki Alakuijala's Avatar
    19th January 2021, 02:03
    Max butteraugli 1.0 and below are good quality. Butteraugli 1.1 is more 'okeyish' rather than 'solid ok'. At max butteraugli of 0.6 I have never yet been able to see a difference. Butteraugli scores are calibrated at a viewing distance of 900 pixels, if you zoom a lot, you will see more. If you obviously disagree with butteraugli (when using max brightness at 200 lumen or less and with a viewing distance of 900 pixels or more), file a bug in the jpeg xl repository and I'll consider adding such cases into the butteraugli calibration corpus. There is some consensus that butteraugli has possibly been insufficiently sensitive for ringing artefacts. 2-3 years ago I have made some changes for it to be less worried about blurring in comparison to ringing artefacts, but those adjustments were somewhat conservative. Please, keep writing about your experiences, this is very useful for me in deciding where to invest effort in jpeg xl and butteraugli.
    39 replies | 3365 view(s)
  • e8c's Avatar
    18th January 2021, 23:24
    ​Special processing if tile completely grayscale, and same algorithm as with "-1" if not. $ ./gray -2 Bennu_Grayscale.ppm Bennu_Grayscale.gray encode, 2 threads: 118 MPx/s $ ./gray -d Bennu_Grayscale.gray tmp.ppm decode, 2 threads: 293 MPx/s $ ls -l -rwxrwx--- 1 root vboxsf 101907455 янв 17 00:08 Bennu_Grayscale.emma -rwxrwx--- 1 root vboxsf 126415072 янв 18 21:37 Bennu_Grayscale.gray -rwxrwx--- 1 root vboxsf 105285138 янв 16 23:46 Bennu_Grayscale.lea -rwxrwx--- 1 root vboxsf 128446880 янв 16 19:26 Bennu_Grayscale.png -rwxrwx--- 1 root vboxsf 732538945 янв 16 19:24 Bennu_Grayscale.ppm New total for MCIS (Moon Citizen's Image Set) v1.20: 12'745'684'464
    19 replies | 5829 view(s)
  • Gotty's Avatar
    18th January 2021, 20:27
    Gotty replied to a thread paq8px in Data Compression
    That explained it. So fixing my detection routine jumped to no1 spot on my to do list. My next version will be about detections and tranforms anyway - as requested. It fits perfectly.
    2300 replies | 607585 view(s)
  • mpais's Avatar
    18th January 2021, 19:32
    mpais replied to a thread paq8px in Data Compression
    2300 replies | 607585 view(s)
  • Lithium Flower's Avatar
    18th January 2021, 16:40
    Have a question about butteraugli_jxl, use cjxl -d 1.0 -j (kitten) and -m -Q 90 -j (speed tortoise), image type is non-photographic(more natural synthetic), jpeg q99 yuv444, -d 1.0 still have some tiny artifacts in this image, like previous post issue, I use butteraugli_jxl check compressed image, look like butteraugli_jxl didn't find that tiny artifacts(maxButteraugli 1.13), those tiny artifacts is a issue? or expected error in vardct mode? { "01_m_Q90_s9_280k.png": { "maxButteraugli": "1.5102199316", "6Norm": " 0.640975", "12Norm": " 0.862382", }, "01_vardct_d1.0_s8_234k.png": { "maxButteraugli": "1.1368366480", "6Norm": " 0.610878", "12Norm": " 0.734107", } } And vardct mode(-d and -q) can mix two codecs for a single image, lossy modular mode(-m -Q) probably can use two mode for a single image?, or like fuif lossy only use reversible transforms (YCoCg, reversible Haar-like squeezing)? In my non-photographic(more natural synthetic) set, lossy modular mode work very well, but vardct mode can produce smaller file.
    39 replies | 3365 view(s)
  • lz77's Avatar
    18th January 2021, 15:20
    lz77 replied to a thread Lucky again in The Off-Topic Lounge
    Ah, so you haven't given up on your godless "Nord Stream" yet? Then Navalny flies to you! :) (Russian: ах, так вы ещё не отказались от своего богомерзкого "северного потока"? Тогда Навальный летит к вам! :) )
    11 replies | 673 view(s)
  • suryakandau@yahoo.co.id's Avatar
    18th January 2021, 14:27
    if the command line breaks when MT=OFF so use MT=ON
    219 replies | 20476 view(s)
  • Darek's Avatar
    18th January 2021, 13:34
    Darek replied to a thread paq8px in Data Compression
    Scores for paq8px v201 for my testset - nice gain for K.WAD, and smaller gains to other files. Some image files and 0.WAV audio file got also small loses, however in total this is about 3.7KB of gain.
    2300 replies | 607585 view(s)
  • Darek's Avatar
    18th January 2021, 12:11
    Darek replied to a thread paq8px in Data Compression
    I've tested it again and got sligtly better option now (but the effect/mechanism is the same) and the scores are: Calgary Corpus => option "-8lrta" 541'975 - for paq8px v191, v191a, v192 556'856 - for paq8px v193 and higher (with some little changes of course but it's a base) Canterbury Corpus => option "-8lrta" - the same as for Calgary Corpus 290'599 - for paq8px v191, v191a, v192 298'652 - for paq8px v193 and higher (with some little changes of course but it's a base)
    2300 replies | 607585 view(s)
  • kaitz's Avatar
    18th January 2021, 05:17
    kaitz replied to a thread paq8px in Data Compression
    MC ​ paq8px_v200 paq8px_v201 diff file​ size -8 -8 A10.jpg 842468 624597 624587 10 AcroRd32.exe 3870784 823707 823468 239 english.dic 4067439 346422 345366 1056 FlashMX.pdf 4526946 1315382 1314782 600 FP.LOG 20617071 215399 213420 1979 * MSO97.DLL 3782416 1175358 1175162 196 ohs.doc 4168192 454753 454784 -31 rafale.bmp 4149414 468156 468095 61 vcfiu.hlp 4121418 372048 371119 929 world95.txt 2988578 313915 313828 87 Total 6109737 6104611 5126
    2300 replies | 607585 view(s)
  • kaitz's Avatar
    18th January 2021, 04:57
    kaitz replied to a thread paq8px in Data Compression
    @Gotty If you do, please add: mbr, base85, uuencoode ​:)
    2300 replies | 607585 view(s)
  • moisesmcardona's Avatar
    18th January 2021, 03:39
    moisesmcardona replied to a thread paq8px in Data Compression
    @Gotty, can the BZip2 transform that @kaitz added in paq8pxd be ported to paq8px?
    2300 replies | 607585 view(s)
  • moisesmcardona's Avatar
    18th January 2021, 02:33
    @kaitz, I opened another MR https://github.com/kaitz/paq8pxd/pull/15. Since you added a BZip2 transform, the BZip2 library needed to be added to the CMakeLists so that it can detect it and allow to compile your latest versions :-)
    1032 replies | 363202 view(s)
  • Darek's Avatar
    18th January 2021, 01:09
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of paq8pxd v95 and previous versions for enwik8 and enwik9: 15'654'151 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: 0,00%, time 10422,07s 122'945'119 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: -0,06%, time 100755,31s - best score for paq8pxd versions 15'647'580 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v90, change: -0,04%, time 9670,5s 123'196'527 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v90, change: 0,20%, time 110200,16s 15'642'246 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v95, change: 0,00%, time 10130s- best score for paq8pxd versions (the same as paq8pxd v93 and paq8pxd v94) 123'151'008 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v95, change: 0,00%, time 102009,55s
    1032 replies | 363202 view(s)
  • Darek's Avatar
    17th January 2021, 18:53
    Darek replied to a thread paq8px in Data Compression
    Yes, of course. I'll check it again and give you switches used for these versions.
    2300 replies | 607585 view(s)
  • moisesmcardona's Avatar
    17th January 2021, 17:35
    moisesmcardona replied to a thread Paq8sk in Data Compression
    Hey Surya, I noticed some of your releases are compiled with MT=ON and some with MT=OFF. Can you please decide to stick with either one? The command line breaks when MT=OFF. Thanks!
    219 replies | 20476 view(s)
  • Jon Sneyers's Avatar
    17th January 2021, 11:00
    Yes, further reducing the memory footprint would be nice. For large images like this, it would also be useful to have an encoder that does not try to globally optimize the entire image, and a cropped decoder. These are all possible, but quite some implementation effort, and it is not the main priority right now — getting the software ready for web browser integration is a bigger priority.
    39 replies | 3365 view(s)
  • e8c's Avatar
    17th January 2021, 07:32
    ​>djxl.exe Bennu_Grayscale_s4.jxl tmp.ppm --num_threads=4 Read 109071829 compressed bytes Done. 20237 x 12066, 19.71 MP/s , 1 reps, 4 threads. Allocations: 7750 (max bytes in use: 7.050512E+09) >djxl.exe Bennu_Grayscale_s4.jxl tmp.ppm --num_threads=2 Read 109071829 compressed bytes Done. 20237 x 12066, 10.50 MP/s , 1 reps, 2 threads. Allocations: 7744 (max bytes in use: 7.041389E+09) >djxl.exe Bennu_Grayscale_s4.jxl tmp.ppm --num_threads=1 Read 109071829 compressed bytes Done. 20237 x 12066, 8.80 MP/s , 1 reps, 1 threads. Allocations: 7741 (max bytes in use: 7.036826E+09) >djxl.exe Bennu_Grayscale_s3.jxl tmp.ppm --num_threads=4 Read 112310632 compressed bytes Done. 20237 x 12066, 35.77 MP/s , 1 reps, 4 threads. Allocations: 7749 (max bytes in use: 7.053651E+09) >djxl.exe Bennu_Grayscale_s3.jxl tmp.ppm --num_threads=2 Read 112310632 compressed bytes Done. 20237 x 12066, 19.57 MP/s , 1 reps, 2 threads. Allocations: 7743 (max bytes in use: 7.044529E+09) >djxl.exe Bennu_Grayscale_s3.jxl tmp.ppm --num_threads=1 Read 112310632 compressed bytes Done. 20237 x 12066, 17.63 MP/s , 1 reps, 1 threads. Allocations: 7740 (max bytes in use: 7.039963E+09) (Scrollable.)
    39 replies | 3365 view(s)
  • suryakandau@yahoo.co.id's Avatar
    17th January 2021, 04:10
    I think it's difficult to go below 550000 bytes of A10 jpeg file..
    219 replies | 20476 view(s)
  • Gotty's Avatar
    17th January 2021, 01:46
    Gotty replied to a thread paq8px in Data Compression
    I need your help investigating it (I can't reproduce). Could you tell me the command line switches you used?
    2300 replies | 607585 view(s)
  • Darek's Avatar
    16th January 2021, 23:13
    Darek replied to a thread paq8px in Data Compression
    If we assume about 20 versions yearly it means that it could be Year 2263.... or there would be a some breakthrough or more...
    2300 replies | 607585 view(s)
  • Gotty's Avatar
    16th January 2021, 23:03
    Gotty replied to a thread paq8px in Data Compression
    150'000? Are you sure you wanted to ask 150'000? OK. Let's do the math ;-) Open Darek's MaximumCompression results a couple of posts above. Look at the first result he recorded (for paq8px_v75): 637110 Look at the last result (for paq8px_v200): 624578 Gaining that 12532 bytes took roughly 125 versions. Doing a simple interpolation (which is totally incorrect, but fits very well to your question): 150'000 bytes will be reached at around paq8px_v4858.
    2300 replies | 607585 view(s)
  • Darek's Avatar
    16th January 2021, 23:01
    Darek replied to a thread paq8px in Data Compression
    In my opinion - in 2077 :) By the way - paq8pxd v95 got score 618'527 bytes for A10.jpg file with option -x9
    2300 replies | 607585 view(s)
  • Darek's Avatar
    16th January 2021, 22:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of my testset and $ Corpuses for paq8pxd v94 and paq8pxd v95. Good improvements on JPG files and files contains such structures. A10.JPG file got 618'527 bytes!
    1032 replies | 363202 view(s)
More Activity