Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Jarek's Avatar
    Today, 10:42
    There is recently a lot of talking about dangers of deep fakes and generally tampering of electronic media (e.g. https://www.wired.com/story/facebook-microsoft-contest-better-detect-deepfakes/ ) - bringing a question here if something could be done from (lossy) data compression perspective to improve this situation, help tampering detection. Here is a paper claiming "the proposed approach increased image manipulation detection accuracy from 45% to over 90%": http://openaccess.thecvf.com/content_CVPR_2019/papers/Korus_Content_Authentication_for_Neural_Imaging_Pipelines_End-To-End_Optimization_of_Photo_CVPR_2019_paper.pdf https://github.com/pkorus/neural-imaging Maybe this kind of optimization should be considered as (an optional) feature for new image (/video) formats like JPEG XL?
    0 replies | 14 view(s)
  • Martin24's Avatar
    Today, 06:23
    Martin24 replied to a thread CHK Hash Tool in Data Compression
    Hi @encode. I have just recently found your program and I like it much. I only have one feature suggestion. Can you please add the option to send the calculated hash value for a file to https://www.virustotal.com ? I have earlier used https://www.binaryfortress.com/HashTools and that program have VirusTotal integration but only when right-clicking at files. If you could add VirusTotal integration in CHK as a button in the toolbar it would be amazing. I also like the GUI much better in your program compared to HashTools, it's very clean and beautiful.
    213 replies | 79009 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 03:09
    Hopefully one day. The spiking neural nets may allow a 2-3 orders of magnitude energy saving in neural computing (decoding) eventually, so they might be able to provide the needed energy budget reduction to neural compression methods. There are not many technical connections between Ihmehimmeli and JPEG XL. One connection is that both were optimized using external hyperparameter optimization methods.
    2 replies | 198 view(s)
  • SolidComp's Avatar
    Today, 01:26
    Can this be fruitfully applied to image compression? What's the connection to JPEG XL, if any? I wonder which approach would be best for multiple sizes or zooms of the same image. Like on Amazon when you roll over a product image and it zooms.
    2 replies | 198 view(s)
  • encode's Avatar
    Yesterday, 00:41
    encode replied to a thread CHK Hash Tool in Data Compression
    Thanks! 8) BTW, CHK v3.21 is here: https://compressme.net/ It's simply more optimized & smaller compile!
    213 replies | 79009 view(s)
  • deus-ex's Avatar
    19th September 2019, 23:09
    deus-ex replied to a thread CHK Hash Tool in Data Compression
    Oh dang, I didn't I notice that. Simple yet effective. :) Just in case, here's an updated German translation including the space characater for "Selected: " = "Ausgewählt: ". I actually missed the ":" in that string previously, too. That's what you get when trying to deliver too quick.
    213 replies | 79009 view(s)
  • encode's Avatar
    19th September 2019, 22:12
    encode replied to a thread CHK Hash Tool in Data Compression
    Thank you for your feedback! And big thanks for your translation! :_superman2: 1. It's not about high resolution it's about high DPI settings (text size is bigger than 100%). Toolbar buttons have 16x16 icons - currently I cannot easily map same toolbar buttons to different glyph sizes (24x24, 32x32, etc.). Will think about this later... 2. Add the space character after "Selected: " 3. It's the string from the system. Should be the same as in Windows' Explorer. "lang.txt" is sort of workaround for now. Yep, I'll put some work on these in the future - especially to keep older Language Packs compatible to upcoming releases - changing format to INI - MenuItemFile=File, ManuItemAddFiles=Add Files... etc. :_coffee:
    213 replies | 79009 view(s)
  • deus-ex's Avatar
    19th September 2019, 20:42
    deus-ex replied to a thread CHK Hash Tool in Data Compression
    Hi encode, here's my updated German language file for CHK v3.20. Apparently the format of the language file did not change at all, as opposed to your announcement above on September 12th. I noticed a couple issues with the interface which I marked with a red circle in the attached screenshot: 1. The icons are quite small and thus are difficult to see on high resolution desktops like mine. 2. The German string for "Selected" = "Ausgewählt" appears to be longer than what you expected, so currently there is no gap to the actual counters. 3. The column "Type" does not appear to make any use of available translation strings, like "Text files" = "Textdatei". Suggestion: Wouldn't it be much more convenient to name the language files each after the language they represent and store them all together in just one subfolder "Language", like Russian.txt, English.txt, German.txt, etc. ? This would spare the user the requirement to first copy the desired language file to the location of CHK.exe, perhaps overwriting any previously existing translation file.
    213 replies | 79009 view(s)
  • pacalovasjurijus's Avatar
    19th September 2019, 20:14
    I have 4 core I have made faster 8 times but we can 10 times faster. Now speed 80 bytes per second. Please, Help me with the process.
    207 replies | 75566 view(s)
  • rarkyan's Avatar
    19th September 2019, 04:43
    I dont know how to read this. Maybe i can ask my friend to"translate" it. I'll see what i can do to help
    207 replies | 75566 view(s)
  • AiZ's Avatar
    18th September 2019, 22:55
    AiZ replied to a thread CHK Hash Tool in Data Compression
    French.
    213 replies | 79009 view(s)
  • Jyrki Alakuijala's Avatar
    18th September 2019, 19:28
    This might be of some interest here as AI is related to compression, and people behind this have worked on Pik and JPEG XL. Ihmehimmeli is a brain-inspired way to do artificial neural nets (recurrent/dynamic/temporal/spiking instead of function/tensor/static). https://ai.googleblog.com/2019/09/project-ihmehimmeli-temporal-coding-in.html code at https://github.com/google/ihmehimmeli
    2 replies | 198 view(s)
  • kaitz's Avatar
    18th September 2019, 17:18
    kaitz replied to a thread paq8px in Data Compression
    paq8px_v182fix2 -8 20321247 687.30 2372 MB
    1717 replies | 479542 view(s)
  • pacalovasjurijus's Avatar
    18th September 2019, 16:34
    from multiprocessing import Pool,Value def f(x): return (x+1)**16383 if __name__ == '__main__': pool = Pool(processes=4) # start 4 worker processes result = pool.apply_async(f, ) # evaluate "f(10)" asynchronously #print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow #print(pool.map(f, range(147456))) # prints "" bnkw=pool.map(f, range(147456)) import binascii import json block=147456 blockw=147455 blockw1=16384 virationc=16383 bitc=14 lenf1=0 a=0 qfl=0 h=0 byteb="" notexist="" lenf=0 dd=0 numberschangenotexistq = qwa=0 z=0 m = p=0 asd="" b=0 szx="" asf2="0b" while b<blockw1: m+= b=b+1 k = wer="" numberschangenotexist = numbers = name = input("What is name of file? ") namea="file.Spring" namem=name+"/" s="" qwt="" sda="" ert=0 aqwer=0 aqwq=0 aqwers=0 qwaw="" with open(namea, "w") as f4: f4.write(s) with open(namea, "a") as f3: f3.write(namem) with open(name, "rb") as binary_file: data = binary_file.read() lenf1=len(data) if lenf1<900000: print("This file is too small"); raise SystemExit s=str(data) lenf=len(data) while dd<3000: a=0 qfl=0 h=0 byteb="" notexist="" lenf=0 numberschangenotexistq = qwa=0 z=0 m = p=0 asd="" b=0 szx="" asf2="0b" while b<blockw1: m+= b=b+1 k = wer="" numberschangenotexist = numbers = s="" qwt="" ert=0 aqwer=0 aqwq=0 aqwers=0 qwaw="" dd=dd+1 if dd==1: sda=bin(int(binascii.hexlify(data),16)) szx="" lenf=len(sda) xc=8-lenf%8 z=0 if xc!=0: if xc!=8: while z<xc: szx="0"+szx z=z+1 sda=szx+sda lenf=len(sda) szx="" for byte in sda: aqwer=aqwer+1 aqwers=aqwers+1 qwaw=qwaw+byte if aqwer<=bitc: qwt=qwt+byte if aqwer==bitc: aqwq=int(qwt,2) qwt="" a=a+1 h=h+1 av=bin(aqwq) if a<=block and aqwer==bitc: aqwer=0 m = aqwq numbers.append(aqwq) if a == block: qwaw="" p=0 while p<blockw1: if p!=m: k.append(p) p=p+1 lenfg=len(k) if lenfg>0: acvb=lenfg-1 notexist=k if notexist<8192: raise SystemExit notexist=notexist-8192 szx=bin(notexist) lenf=len(szx) xc=13-lenf notexist=notexist+8192 z=0 if xc!=0: while z<xc: szx="0"+szx z=z+1 wer=wer+szx lenf=len(szx) szx="" if lenfg==0: raise SystemExit b=-1 kl=blockw cb=0 er=-1 ghj=0 ghjd=1 bnk=1 p=0 cvz=0 for p in range(blockw): if lenfg>0: if virationc!=numbers: byteb=numbers numberschangenotexist.append(byteb) if virationc==numbers: numberschangenotexist.append(notexist) ghj=numberschangenotexist qfl=qfl+1 ghjd=ghj bnk=1 bnkd=1 kl=kl-1 qwa=qwa+1 if lenfg>0: bnk=bnkw ghjd=0 ghjd=ghj*bnk cvz=cvz+ghjd szx=bin(cvz) cvz=0 lenf=len(szx) if lenfg>0: xc=2064370-lenf z=0 if xc!=0: while z<xc: szx="0"+szx z=z+1 wer=wer+szx lenf=len(szx) szx="" a=0 numberschangenotexist = del k del numbers m = b=0 while b<blockw1: m+= b=b+1 b=0 a=0 wer=wer+qwaw qwaw="" wer="1"+wer+"1" if dd==3000: lenf=len(wer) xc=8-lenf%8 z=0 if xc!=0: if xc!=8: while z<xc: szx="0"+szx z=z+1 wer=wer+szx lenf=len(szx) szx="" wer="0b"+wer n = int(wer, 2) jl=binascii.unhexlify('%x' % n) sda=wer with open(namea, "ab") as f2ww: f2ww.write(jl)
    207 replies | 75566 view(s)
  • Mauro Vezzosi's Avatar
    18th September 2019, 16:07
    Italian.
    213 replies | 79009 view(s)
  • maadjordan's Avatar
    18th September 2019, 14:50
    maadjordan replied to a thread smpdf in Data Compression
    the conclusion from my post the PSO can get better result than running cpdf several times and the gain much better. The difference in cpdf is mainly the result reordring streams.
    11 replies | 3143 view(s)
  • encode's Avatar
    18th September 2019, 14:34
    encode replied to a thread CHK Hash Tool in Data Compression
    Okay, I think it's time to test the CHK v3.20: https://compressme.net/ :_superman2:
    213 replies | 79009 view(s)
  • Jyrki Alakuijala's Avatar
    18th September 2019, 12:37
    Have more fun and productivity with low-level platform-independent SIMD programming in C++ with https://github.com/google/highway. Highway is designed by Dr. Jan Wassenberg. We used a similar approach for HighwayHash, Randen, Pik, and JPEG XL, and now we have decided to separate the SIMD library into a separate library to make it more appealing for others to use it. The project name highway is a reference to multiple lanes of computation.
    0 replies | 126 view(s)
  • necros's Avatar
    18th September 2019, 11:51
    necros replied to a thread smpdf in Data Compression
    this doesn`t answer the question why every iteration gives different size)
    11 replies | 3143 view(s)
  • Darek's Avatar
    18th September 2019, 10:45
    Darek replied to a thread paq8px in Data Compression
    Looks like the v182fix change is biggest improvement for JPG files during lots of versions ago... @LucaBiondi - could you format yours second table numbers to 0 decimal places and to use 1000 separator? It would be slightly easier to read.
    1717 replies | 479542 view(s)
  • Shelwien's Avatar
    18th September 2019, 00:52
    I think its mainly targeted at NN/TPU (8-bit floats etc).
    2 replies | 112 view(s)
  • LucaBiondi's Avatar
    17th September 2019, 23:51
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi guys! We got big improvements from V182 to V182fix2 JPEG files gain 225 KB (WOW!) PDF files gain 11 KB DOC files gain 400 bytes ISO files gain 1 KB other files gain 0 byte NEW OVERALL RECORD! NEW JPEG RECORD NEW PDF RECORD NEW DOC RECORD NEW ISO RECORD Well done Gotty & Kaitz you are a pro! I love it when plains come togheter! Luca follow my cutting edge blog @https://sqlserverperformace.blogspot.com/
    1717 replies | 479542 view(s)
  • pacalovasjurijus's Avatar
    17th September 2019, 22:17
    Please, help me with python. I want to add processes to here. I have 8 core computer. Can't find anything from find this website: https://docs.python.org/3.1/library/multiprocessing.html What do I need to do which my code? Here is my code of python: block=147456 bnk=1 virationc=16383 kl=block bnk=pow(virationc,kl) Here is my Whole code:
    207 replies | 75566 view(s)
  • fhanau's Avatar
    17th September 2019, 20:36
    It is not run after ECT's deflate. There is some usage of deflate in the brute force and incremental strategies that may have confused you.
    416 replies | 104766 view(s)
  • encode's Avatar
    17th September 2019, 20:19
    encode replied to a thread CHK Hash Tool in Data Compression
    + Fixed Status Bar flicker + Added CSV export (Excel format - semicolon separated values) + Added "Set Font..." option (Font Name + Font Size) :_coffee:
    213 replies | 79009 view(s)
  • Gotty's Avatar
    17th September 2019, 19:32
    Gotty replied to a thread paq8px in Data Compression
    Nice! Well done! The explanation for the v153 speedup is here (exemodel is not applied to text blocks anymore). The explanation for the v179fix1 and v179fix2 speedup is here (removed stuff) and here (divisions were eliminated).
    1717 replies | 479542 view(s)
  • Gotty's Avatar
    17th September 2019, 18:53
    Gotty replied to a thread paq8px in Data Compression
    Wow, I had no idea what that 1 and 2 could have meant. Now I know. A mystery is resolved in my head. Thanx.
    1717 replies | 479542 view(s)
  • kaitz's Avatar
    17th September 2019, 18:16
    kaitz replied to a thread paq8px in Data Compression
    px was worse because old wordmodel (v181) had : if ((c>='a' && c<='z') || c==1 || c==2 || (c>=128 &&(c!=3))) { it affected word0. These are wrt Firstupper and Upperword. In pxd its part of text0 but not word0. etc. Now its almost same compression as pxd.
    1717 replies | 479542 view(s)
  • Darek's Avatar
    17th September 2019, 17:53
    Darek replied to a thread paq8px in Data Compression
    Some my scores for FlashMX.pdf (difference to v182fix1 with option -9et = -8 bytes): paq8px_v182fix2 -9 1,320,362 paq8px_v182fix2 -9a 1,327,992 paq8px_v182fix2 -9et 1,319,236 paq8px_v182fix2 -9eta 1,326,909 Looks like "et" option gives some gain and option "a" hurts compression this time.
    1717 replies | 479542 view(s)
  • schnaader's Avatar
    17th September 2019, 15:18
    Sounds interesting, nice to see that there is research in that direction. Though it will be even harder for some new format to replace IEEE floats than with e.g. new image formats like APNG or FLIF. The old IEEE floats have many flaws/pitfalls, but they have been used and researched for a long time. After a quick glance, it looks like a useful experimental new format that seems to focus on precision instead of performance (although it tries to adress this, too).
    2 replies | 112 view(s)
  • maadjordan's Avatar
    17th September 2019, 12:50
    maadjordan replied to a thread smpdf in Data Compression
    I tired before to replicate your case on my files but with no success. currently the new compile give the following original 350,140 1st 347,923 2nd 347,921 3rd 347,921 4th 347,918 5th 347,915 6th 347,916 7th 347,918 8th 347,918 9th 347,915 10tg 347,922 but with PDFsizeopt i get 344,569 CPDF after PSO gives 344,674 CPDF can be used for quick optimizations and repair some damaged files but PSO can gain way more reduction on most cases. the file still has meta data after optimization and no tool can remove them safely. but still running multiple run be added to File-Optimizer to automate the file size comparing process. or you can write a batch script to do so. (i think such script has been made in this forum before)
    11 replies | 3143 view(s)
  • necros's Avatar
    17th September 2019, 12:26
    necros replied to a thread smpdf in Data Compression
    pls support my bug report, author closed and ignores it but it still exist in 2.3 (multiple iterations of optimizing same file give random results) https://github.com/coherentgraphics/cpdf-binaries/issues/12
    11 replies | 3143 view(s)
  • encode's Avatar
    17th September 2019, 12:14
    encode replied to a thread CHK Hash Tool in Data Compression
    8)
    213 replies | 79009 view(s)
  • schnaader's Avatar
    17th September 2019, 11:45
    schnaader replied to a thread paq8px in Data Compression
    Results for reymont and FlashMX.pdf PDF files on a Hetzner cloud server (2 vCPU, 4 GB RAM, 64-Bit Ubuntu) to test the PDF part from v182: comp. size (bytes) time (s) memory (MiB) reymont (6,627,202 bytes), polish text, uncompressed PDF paq8px_v181fix1 -8 771,345 1795.53 2204 paq8px_v181fix1 -9 770,107 1770.64 4028 paq8px_v181fix1 -9a 765,737 1804.49 4028 paq8px_v182fix1 -9 758,969 1802.48 4018 paq8px_v182fix1 -9a 755,188 1836.77 4018 FlashMX.pdf (4,526,946 bytes), english text, compressed PDF, many images paq8px_v181fix1 -8 1,329,386 1898.80 2529 paq8px_v181fix1 -8a 1,336,691 1920.17 2529 paq8px_v181fix1 -9 out of memory paq8px_v182fix1 -8 1,321,460 1844.51 2523 paq8px_v182fix1 -8a 1,328,893 1868.74 2523
    1717 replies | 479542 view(s)
  • Darek's Avatar
    17th September 2019, 11:03
    Darek replied to a thread paq8px in Data Compression
    And my testset scores by paq8px v182fix2 - for F.JPG file this version got the best overall score! That means paq8px variant holds 9 files records at now! For A10.JPG from MaximumCompression paq8px v182fix2 got also the best score overall = 628'405 bytes! Additioaly there are enwik8 all scores for v182fix1: 16'832'420 - enwik8 -s7eta -Paq8px_v182fix1 16'437'368 - enwik8.drt -s7eta -Paq8px_v182fix1 16'411'564 - enwik8 -s9eta -Paq8px_v182fix1 - best score for paq8px series 16'086'836 - enwik8.drt -s9eta -Paq8px_v182fix1
    1717 replies | 479542 view(s)
  • CompressMaster's Avatar
    17th September 2019, 10:14
    As for randomness, random data are incompressible, that´s true. But since randomness does not exists, they´re all compressible, only we need to find correct patterns. We still have 256 bytes even in text document where each character is unique. It depends only on selected interpretation of your data. Mr. Matt Mahoney tries to persuade you that you have a method that simply does not work at all. I DISAGREE WITH THAT! Well, randomness DOES NOT EXISTS AT ALL, its all about finding better and better way to minimize randomness and improve compression. As to BARF, it´s a fake software that does not compress anything. It was written as a joke to debunk claims that random compression is impossible, because some people claimed that they´re able to compress random data recursively. Again, it´s not possible to compress random data, because they does not exists at all - we still have some (and lot of, yet randomly occuring) patterns in it. Of course, infinite compression is impossible - some info MUST be stored. But recursive random data compression might be possible some day. It´s all about finding better and better ways how to do something. I believe... maybe I have an overestimated expectations, but never say never...
    207 replies | 75566 view(s)
  • Shelwien's Avatar
    17th September 2019, 10:06
    https://en.wikipedia.org/wiki/Unum_(number_format)#Type_III_Unum_%E2%80%93_Posit https://gitlab.com/cerlane/SoftPosit https://github.com/milankl/Sonums.jl
    2 replies | 112 view(s)
  • Darek's Avatar
    17th September 2019, 08:54
    Darek replied to a thread paq8px in Data Compression
    enwik7 test scores with timings - in Excel file. I've made it for all versions that I've had on the laptop. And the two charts - scores and timings by version and scores in time (based on files dates). I think it could be good also input into Jarek's database.
    1717 replies | 479542 view(s)
  • rarkyan's Avatar
    17th September 2019, 05:16
    I have question. Maybe someone can answer it. At this page: https://encode.su/threads/1176-loseless-data-compression-method-for-all-digital-data-type?p=31483&viewfull=1#post31483 Mr. Matt Mahoney give me code to generate 2^75000000 which is on his statement, it might be run in a very long time. I dont know whether it is possible to get the result or not, i did test it long long time ago. But since it doesnt have pause menu, i cant continue it. But in this page : https://encode.su/threads/1176-loseless-data-compression-method-for-all-digital-data-type?p=60768&viewfull=1#post60768 Schnaader states that : I once asked on some math forum about computer ability to generate 2^75000000. I forget the link, but as i remembered one of their representative answer that it will run in "infinite" time. ------------------------------------------- And the question is : 2^75000000 Is it possible? How long the process if its possible? Infinite or not ------------------------------------------- Im continuing my experiment.
    207 replies | 75566 view(s)
  • LucaBiondi's Avatar
    17th September 2019, 01:05
    LucaBiondi replied to a thread paq8px in Data Compression
    Thanks Gotty ...ready, just started to test!:_cool2: Luca
    1717 replies | 479542 view(s)
  • kaitz's Avatar
    16th September 2019, 23:17
    kaitz replied to a thread paq8px in Data Compression
    IMG080.jpg (967711 bytes) paq8px_182.fix1 -8 737230 Time 23.43 sec, used 2372 MB (2487680938 bytes) of memory paq8px_v182fix2 -8 736736 Time 21.69 sec, used 2372 MB (2487680938 bytes) of memory paq8pxd_v68_AVX2 -s8 736627 Time 19.29 sec, used 2209 MB (2316655105 bytes) of memory
    1717 replies | 479542 view(s)
  • Gotty's Avatar
    16th September 2019, 22:42
    Gotty replied to a thread paq8px in Data Compression
    Aham, that helps indeed (with larger jpegs), and it's logical, too! Going in the next release. Thanx so much! ;-) Luca will be happy. "Preview" attached. Luca, it's all yours.
    1717 replies | 479542 view(s)
  • kaitz's Avatar
    16th September 2019, 22:16
    kaitz replied to a thread paq8px in Data Compression
    In SSE class like this: case JPEG: { pr = pr0; break; } In pxd i dont have final APM, it really hurts compression.
    1717 replies | 479542 view(s)
  • Gotty's Avatar
    16th September 2019, 22:00
    Gotty replied to a thread paq8px in Data Compression
    Line 11105 in v182fix1? pr = (pr+pr0+1)>>1; Hmmm.. It's worse if a remove it (just tested with smaller and larger files as well). Is this the line you meant? Edit: @Luca: I tested it on your 3 large files :-) Of course. That is my large test set :-)
    1717 replies | 479542 view(s)
  • LucaBiondi's Avatar
    16th September 2019, 21:51
    LucaBiondi replied to a thread paq8px in Data Compression
    If you want add an option i will be happy to test it! Luca
    1717 replies | 479542 view(s)
  • kaitz's Avatar
    16th September 2019, 21:41
    kaitz replied to a thread paq8px in Data Compression
    More ... :D JPEG -> what if you removed final APM in SSE class for JPEG. Will compression be better?
    1717 replies | 479542 view(s)
  • Gotty's Avatar
    16th September 2019, 20:28
    Gotty replied to a thread paq8px in Data Compression
    Thanx! I have it on my to do list for a long time - since Darek suggested, and you gave these hints.
    1717 replies | 479542 view(s)
  • Gotty's Avatar
    16th September 2019, 20:22
    Gotty replied to a thread paq8px in Data Compression
    I noticed that when you posted the results last time and it matched to my results exactly (I also run tests on level -8 ) - except for some files where I used "-a" (adaptive learning rate). We are on the same wavelength.
    1717 replies | 479542 view(s)
  • kaitz's Avatar
    16th September 2019, 20:13
    kaitz replied to a thread paq8px in Data Compression
    nci improvement comes from wrt filter, as all other large files. DEC Alpha main improvement comes mostly from byte order swap and call filter. osdb improvment comes from Wordmodel i think, cant remember what context/check it was. not sure about others. As for testing with option -t I always test without it on files. At least when comparing with pxd versions.
    1717 replies | 479542 view(s)
  • Gotty's Avatar
    16th September 2019, 19:53
    Gotty replied to a thread paq8px in Data Compression
    Yes, you are absolutely right! Fix1 contains the extended text-pretraining. Any pre-training helps only during the first some kilobytes (of text files of course), when the NormalModel and WordModel of paq8px doesn't know anything about words and their morphology. As soon as the NormalModel and WordModel has learnt enough from the real data the effect of pre-training fades away and the model takes on. It means that the larger the file, the less text-pretraining helps proportionally. I don't know when it happens, but your feeling of 100K-200K seems right. The truth is: text-pretraining is not advantageous. Look: paq8px_v182fix1 -9a : 16'456'404 (no pre-training) paq8px_v182fix1 -9at: 16'411'564 (with pre-training) The difference is 41'840 bytes In order to decompress you'll need paq8px_v182fix1.exe (it's size must be added to the size of both results), and for the second case with pre-training, you'll need the pre-training files as well. So how large are they? Let's see. paq8px_v182fix1 -9a: 109'701 (input file is a list file containing: english.dic, english.emb, english.exp) 16'411'564+109'701= 16'521'265 We lost 64'861 bytes! The result without pre-training is better! I suggest that we don't use pre-training at all in any benchmarks - or when we use pre-training we must add the compressed size of the pre-training files to the final result. If we don't take into account these files, the result gives us a false sense that paq8px has beaten cmix. I suppose if you don't use pre-training neither for paq8px nor for cmix, cmix would still beat paq8px. When you have some time, could you run a test only on the files in your testset where paq8px has beaten cmix? I wounder what the results would be.
    1717 replies | 479542 view(s)
  • maadjordan's Avatar
    16th September 2019, 17:27
    maadjordan replied to a thread smpdf in Data Compression
    It seems that unicode file name support was missed during compilation. A temporary windows compile is available through this link http://www.coherentpdf.com/16thSeptember2019.zip
    11 replies | 3143 view(s)
  • boxerab's Avatar
    16th September 2019, 15:42
    Cool, thanks @pter : Let the HTJ2K vs XS battle begin !
    28 replies | 3471 view(s)
  • Darek's Avatar
    16th September 2019, 12:00
    Darek replied to a thread paq8px in Data Compression
    Scores of 4 Corpuses for paq8px v182fix1 - amazing improvements especially for smaller files (Calgary and Canterbury corpuses) - almost all the best scores for paq8px and the biggest gain I've ever seen for paq8px during the next versions (0.8-0.9%)! For ob1, progb, progc (Calgary), fields.c, grammar.lsp, sum, xargs.1 (Canterbury), FlashMX.pdf (MaximumCompression) this version have the best overall scores and beat latest cmix v18 version! I have one insight (maybe I'm wrong) but most of changes on fix1 versions gives 200-500 bytes of gain independent to file size (it's similar on R.DOC and G.EXE or even smaller for K.WAD) - it looks like this improvement works only or mostly for first 100-200KB or I'm wrong... One tip to more improve on Silesia corpus (I know it's tuning mostly for this corpus) -> there are some changes in paq8pxd version by Kaitz which a) adds DEC Alpha parser/model - it's gives about 500KB of gain on Mozilla file. b) There are model which gives about 60KB of gain for NCI file. Files oofice, osdb and x-ray are also better compression but maybe it's specific for this version of paq. Additionaly there are scores of enwik8 and enwik9 for paq8px v182 (w/o fix yet): 16'838'907 - enwik8 -s7eta -Paq8px_v182 16'435'259 - enwik8.drt -s7eta -Paq8px_v182 16'428'290 - enwik8 -s9eta -Paq8px_v182 16'086'695 - enwik8.drt -s9eta -Paq8px_v182 133'672'575 - enwik9 -s9eta -Paq8px_v182 129'948'994 - enwik9.drt -s9eta -Paq8px_v182 133'591'653 - enwik9_1423 -s9eta -Paq8px_v182 - best score for all paq8px version (except paq8pxd) 129'809'666 - enwik9_1423.drt -s9eta -Paq8px_v182 - best score for all aq8px version (except paq8pxd)
    1717 replies | 479542 view(s)
  • Krishty's Avatar
    16th September 2019, 09:05
    I forgot … there is one thing you could help me with. I see that genetic filtering is implemented in lodepng’s encoder, which seems to run after Zopfli. If so, what are the reasons for running it *after* deflate optimization instead of before – wouldn’t that affect compression negatively, especially block splitting?
    416 replies | 104766 view(s)
  • pter's Avatar
    16th September 2019, 06:17
    pter replied to a thread JPEG 3000 Anyone ? in Data Compression
    The HTJ2K (ISO/IEC 15444-15 | ITU T.814) specification has been published and is available free of charge at: https://www.itu.int/rec/T-REC-T.814/en
    28 replies | 3471 view(s)
  • Krishty's Avatar
    16th September 2019, 00:57
    Yes, but I didn’t get to the actual tests yet because I wanted to isolate the deflate part first. I’ll let you know once I have the results! Sorry if I was unclear – with -60 and -61 I mean -10060/-20060/-30060/etc. It would be a pity to remove those as the fun starts at -xxx11 and the sweet spot for maximal compression seems to be at -xxx30 to -xxx60 :) Yes, that is absolutely right and it’s absolutely possible that my test set was just bad. However, looking at ECT’s PNG performance – where it is almost never beaten, Leanify being not even close – that could imply some sort of error (if the benchmarks turn out to be valid, again). Sorry, I should rather have expressed this as “TODO for me to check out” rather than “questions” … I’m trying not to bother you with guesses here, rather trying to find out what’s going on in my tests and documenting it for others in case it’s useful to them :)
    416 replies | 104766 view(s)
  • fhanau's Avatar
    16th September 2019, 00:17
    1. -3 does not perform substantially better than -4 in my tests. Have you considered to use a different test set? 2. -60 and -61 are not supported options. In a future version ECT will reject those arguments so questions like these don't come up anymore. 3. That depends on the settings used for the tools and the files contained in the zip. ECT was mostly tuned on PNG and text files. On the example you provided, ECT does nineteen bytes worse than leanify, I think occasionally doing that amount worse is acceptable.
    416 replies | 104766 view(s)
  • Krishty's Avatar
    16th September 2019, 00:02
    Great work, thanks a lot! Guess I’ll do some tests anyway, just out of curiosity :) This helps me a lot to get a high-level overview, thanks. So – just to establish a check point here – my open questions with ECT are: Why does -3 perform substantially better than -4 or any higher levels? I know so far: it’s a filter thing (it does not show in my deflate-only benchmarks) a workaround is using the --allfilters option it could be rooted in OptiPNG How can -61 sometimes take a thousand times longer than -60? (Not -62 vs -61, sorry for the error in my previous post!) definitely a deflate thing; ECT-only could be related to long runs of identical pixels (does not show with Lenna & Co., but with comics and renderings) How can Leanify & advzip outperform ECT on ZIP files when my benchmarks show such a high superiority of ECT with PNGs? I’ll try to find answers in subsequent benchmarks …
    416 replies | 104766 view(s)
  • fhanau's Avatar
    15th September 2019, 23:10
    This is a simple heuristic that tunes the LZ cost model based on the results gained from running lazy LZ first when we only have time for one iteration. It is only enabled for PNG when using a low compression level, where it really helps in making ECT with one iteration competitive.
    416 replies | 104766 view(s)
  • fhanau's Avatar
    15th September 2019, 23:06
    I wrote most of ECT years ago, but it mostly comes down to performance improvements in pretty much every part of ECT's deflate, much better caching, a new match finder and better handling of the iterative cost model.
    416 replies | 104766 view(s)
  • MegaByte's Avatar
    15th September 2019, 21:20
    Some of ECT's filtering code was written by me -- including a genetic algorithm inspired by PNGwolf (as long as you activate it) but with better overall performance especially due to better seeding from the other filter methods. I don't expect PNGwolf to win in any cases currently. A number of the other filter algorithms were inspired by Cedric Louvier's post about TruePNG. Since that time, he wrote pingo, which does many of those filters much more efficiently than the brute-force methods included in the ECT code.
    416 replies | 104766 view(s)
  • Krishty's Avatar
    15th September 2019, 16:02
    Me as well. Unfortunately, no clue. ECT’s source code is very different, and for example in squeeze.c I see vast floating-point math on symbol costs with comments like: Sorry, but this is the first time I look into compression code; even plain zlib is still overwhelming to me and ECT looks like a master or doctor thesis to me. Maybe Felix could elaborate on that? (Also, I get carried away from the original question – whether ECT’s filtering is better than PNGwolf’s :) )
    416 replies | 104766 view(s)
  • maadjordan's Avatar
    15th September 2019, 15:07
    maadjordan replied to a thread smpdf in Data Compression
    CPDF v2.3 has been released. https://coherentpdf.com/blog/?p=92 bin for Win,Mac & Linux : https://github.com/coherentgraphics/cpdf-binaries
    11 replies | 3143 view(s)
  • Jyrki Alakuijala's Avatar
    15th September 2019, 14:40
    Do we know why? Better block split heuristics? I'd love to see such improvements integrated into the original Zopfli, too.
    416 replies | 104766 view(s)
  • Krishty's Avatar
    15th September 2019, 13:11
    In order to get the Deflate benchmarks more fair, I downloaded all compressors I know, compiled them on Windows for x64, and ran them. All sample images had row filtering entirely disabled (filter type zero) and were compressed with the Z_STORE setting to avoid bias if tools want to re-use compression choices from original input. The tests typically take a day or two, so there’s just a few data points so far: Lenna, Euclid, PNG transparency demonstration; all shown below. We’re looking at a very tight size difference here (often just at a per mille of the image). The size differences are really small. First, it can definitely be stated that ECT’s Zopfli blows everything else away. For little compression, it’s always several times faster than the Zopfli variantes. For long run times, it constantly achieves higher compression ratios. So high that often the worst run of ECT compresses better than the best run of any Zopfi-related tool. But ECT has some weird anomaly where more than 62 iterations where sometimes it becomes incredibly inefficient and suddenly takes ten or thousand(!) times longer to run than 61 or less iterations. This can be seen clearly on Euclid, but it is worse on transparency where I had to omit all runs above 61 iterations because the run-time jumped from twelve seconds to 24,000 (a two-thousand-fold increase)! Second, advpng’s 7-Zip seems to be broken. You don’t see it in the benchmarks because it compresses so bad that it didn’t make it into any of the graphs. It’s constantly some percent(!) worse than Zopfli & Co and I just can’t believe that. There has to be a bug in the code, but I couldn’t investigate that yet. Now, Zopfli. Advpng made very minor adjustments to its Zopfli (or it is just an outdated version?) and apart from the higher constant overhead, it’s basically the same. Leanify’s Zopfli has had some significant changes. It sometimes compresses better, sometimes worse. But on low compression levels, it often compresses better. The one problem I see with ECT is that its performance is almost unpredictable. Though better than Zopfli, the difference from -10032 to -10033 can be as large as the difference between Zopfli and ECT. This will be a problem with my upcoming filter benchmarks. I should check whether it smoothes when I apply defluff/DeflOpt to the output. Input images are attached.
    416 replies | 104766 view(s)
  • Krishty's Avatar
    14th September 2019, 22:01
    Fixed. A few years ago, I wrote a custom PNG variation with PPMd instead of Deflate which worked pretty well with the expand for 7z function in my optimizer. However, I ditched it because non-standard formats are pretty much useless. Now I’m investigating ECT’s efficiency. Nothing else comes to my mind right know. The Optimizer has a (non-critical) memory problem with GIF optimization. FlexiGIF outputs a *lot* of progress information, sometimes as much as a GiB over a few days of run-time. Optimizer keeps all that (needlessly) in memory. I’ll fix that for the next version.
    18 replies | 919 view(s)
  • Krishty's Avatar
    14th September 2019, 21:54
    Krishty replied to a thread FileOptimizer in Data Compression
    I noticed that the specific order of operations in Papa’s often yields 18-B savings over almost all other JPEG optimizers, but I didn’t have any time yet to investigate the cause. In case anyone bothers to find out, I attached Papa’s JPG handling code. I’d be glad to learn what causes this gain because I’m sure it can be reached more efficiently!
    652 replies | 185787 view(s)
  • CompressMaster's Avatar
    14th September 2019, 20:52
    @Krishty, 1,By attaching, I mean your 1st post. Could you repair that? Thanks. 2,What other unpublished stuffs do you have? (compression field)
    18 replies | 919 view(s)
  • maadjordan's Avatar
    14th September 2019, 17:34
    maadjordan replied to a thread 7-zip plugins in Data Compression
    New Plugin Added: ExFat7z
    1 replies | 422 view(s)
  • Krishty's Avatar
    14th September 2019, 16:46
    For pixels, yes. Metadata not.
    18 replies | 919 view(s)
More Activity