Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Shelwien's Avatar
    Yesterday, 22:40
    AI/ML/NN are totally useless for cryptography (too slow and inefficient).
    10 replies | 303 view(s)
  • LawCounsels's Avatar
    Yesterday, 22:11
    With a few deliberate specific crafted inputs differing from each other in smallest possible adjacent next ( e.g. differ by just 1 single bit only ...etc ) AI neural network may extract some meaningful underlying salient patterns ( in total intelligibible manner to us human ) despite decryptions , which AI Neural Networks are most adapt at ) ?
    10 replies | 303 view(s)
  • Shelwien's Avatar
    Yesterday, 21:27
    > With AI this can be much less than billion ? No, unless we're talking about detecting a specific version of a known algorithm, or parameters. We basically need to extract the algorithm description from sample comparison... at the very least we need to test every path through the algorithm (unique combination of taken branches). I guess you can find some complexity estimations for https://en.wikipedia.org/wiki/Fuzzing since it does something similar. > Again with quantum AI can reverse engineer despite encryption of compressed output ? As I said, quantum computing is not magical, its just computing based on elementary particles and physical laws. But even if we can test 2^100 keys in parallel, its still only equivalent to reducing the key size by 100 bits... which doesn't change anything if full key has 1024 bits. Also the actually existing "quantum" hardware is still slower than modern electronics.
    10 replies | 303 view(s)
  • LawCounsels's Avatar
    Yesterday, 20:46
    >>You're talking about this: https://en.wikipedia.org/wiki/Chosen-plaintext_attack With billions of samples it should be possible to reverse-engineer normal compression algorithms With AI this can be much less than billion ? >>but its a known case in cryptography, so adding encryption after compression would still beat this type of attacks. Again with quantum AI can reverse engineer despite encryption of compressed output ?
    10 replies | 303 view(s)
  • pacalovasjurijus's Avatar
    Yesterday, 20:24
    In version 1.0.0.1.7: In White hole software I used algorithm paq for c1 and u1, c and u I use algorithm and c2 and u3 I use my algorithm Calculus. Now, I am working in version 1.0.0.1.8. I use in the version my new algorithm for: c3 and u3 algorithm sorting information of 1 and 0 like yes and no.
    29 replies | 1684 view(s)
  • User's Avatar
    Yesterday, 17:48
    User replied to a thread paq8px in Data Compression
    Test file PIF.mht PAQ8pxd (all versions) - ok PAQ8px to v71 - ok FP8 to v4 - ok PAQ8px over v72 (to v132) - error (file not created) FP8 over v5 (to v6) - error (file not created) i3-4130, 16 GB, Win 8.1
    1815 replies | 518379 view(s)
  • Marco_B's Avatar
    Yesterday, 17:06
    ​I finished to elaborate Lens with a standard contest machinery, now every node in the trie has a presence in all the recency lists associated to the characters of an order 1 set: in case of an actually encountered contest the node is placed at the head of the list, for other ones at the tail. This is necessary to keep the consistency of the specific system of the Lens series (see above) to transmit a symbol. I conserved a single AC statistic for the fatherhood instead to reply it for every contest because in the mean the differences should be zero. I have been forced to decouple the LRU for the full dictionary from the list for zero children because it would be difficult to make a choice regarding the various rank of the symbols in respect to their contests. Unfortunately, though the compression ratio is better than that of Lens3, it remains far worse than simple emitting the appropriate bit index as in classical LZW. So I must admit this is a sterile path and I stop it here.
    1 replies | 351 view(s)
  • Sportman's Avatar
    Yesterday, 13:24
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Today an Iranian news agency interview with a member of parliament gave confirmation that at Feb 13, 2020 there where already 50 deaths in Iran. Patient zero was a trader who flew regular to China for work. Last week elections where not canceled, 42.6% voted (from people with vote rights).
    8 replies | 204 view(s)
  • Sportman's Avatar
    Yesterday, 04:03
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Italy passed Singapore and Japan (patient zero still not found): South Korea 602, Italy 157, Japan 146, Singapore 89, Hong Kong 74. Real CN, IR and HK counts are probably 10-60 times higher then reported. Serious or critical: 20-22% (each hospital can handle only a limited amount of cases) Deaths: 2-15% (flu 0.1-0.2%) (Finger) puls oximeter round 15-20 euro for early detection.
    8 replies | 204 view(s)
  • Shelwien's Avatar
    Yesterday, 03:05
    You're talking about this: https://en.wikipedia.org/wiki/Chosen-plaintext_attack With billions of samples it should be possible to reverse-engineer normal compression algorithms, but its a known case in cryptography, so adding encryption after compression would still beat this type of attacks.
    10 replies | 303 view(s)
  • LawCounsels's Avatar
    Yesterday, 02:52
    How many input and compressed file sets would you need to be able reverse engineer ? You may also want to design your special input files set !
    10 replies | 303 view(s)
  • Shelwien's Avatar
    Yesterday, 02:32
    > it has 0-bits of advantages over ordinary bruteforce Well, the main idea of quantum computing is to use elementary particles and laws of physics for computing. But even normal electronics have some non-zero probability of errors, and require more and more compensation is circuit design (parity checks, ECC etc). And quantum logic has tens of percents of error probability per operation, so it was necessary to invent a whole new type of algorithms as a workaround. Still, there's a potential for higher density and parallelism than what we can have with further evolution of semiconductor electronics.
    10 replies | 303 view(s)
  • well's Avatar
    Yesterday, 01:16
    quantum computing is term for selling more oil and gas as usually;) ordinary computer has one pseudo random number generator(prng) quantum computer has n-th true random number generators, where n is quantity of cubits... in gaussian distribution it has 0-bits of advantages over ordinary bruteforce...but in human associative style of actions it may be useful may be not:p it is nice to play cossacks with 8 000 units acting independently, that's all for what quantum computing is invented:D for your task in common case set of functions mapping one file to smaller file through instruction sets is countable but very big, that's why this task likes chess game is unsolved for a long time may be till the end of humanity for ia-32 and amd64!i'm sorry can not solved in one defined way but many algorithms and many programs you can get as hex-ray c-style decompiler:rolleyes:
    10 replies | 303 view(s)
  • Shelwien's Avatar
    Yesterday, 00:32
    1) It may be possible to guess a known compression method from compressed data. There're even "recompression" programs (precomp etc) which make use of this to undo existing compression and apply something better. But even simple detection can be pretty hard (for example, lzham streams are encoded for specific window-size value - basically its necessary to try multiple codec versions and all possible window-size-log values to attempt decoding). 2) For some data samples (where the structure is easy to understand and predict from small substrings of data) and static bitcode algorithms it may be possible to reverse-engineer an unknown compression method - though even this requires luck and a lot of work. 3) Universal automatic solution to this task should be equivalent to Kolmogorov compression, and also breaking all cryptography etc. According to my estimations in some previous related threads here (based on Planck length and time), even the impossibly perfect quantum computers would just add ~150 bits to the key size that can be realistically bruteforced. So I'd say that a _new_ adaptive AC-based compression method can't be reverse-engineered from the data.
    10 replies | 303 view(s)
  • Sportman's Avatar
    23rd February 2020, 22:19
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    No comment. https://www.youtube.com/watch?v=F_TPjbu4FAE
    8 replies | 204 view(s)
  • Bulat Ziganshin's Avatar
    23rd February 2020, 22:19
    it's so easy that noone bothers to implement that. you may be first
    10 replies | 303 view(s)
  • Matt Mahoney's Avatar
    23rd February 2020, 21:29
    The new contest was entirely Marcus Hutter's idea. His money, his rules, although we reviewed them before the announcement. To me, the compressor is irrelevant to Kolmogorov complexity, but I think it does give an interesting twist to the contest. It will be interesting to see how you could take advantage of the shared code between the compressor and decompressor.
    39 replies | 1278 view(s)
  • well's Avatar
    23rd February 2020, 19:23
    well replied to a thread Hutter Prize update in Data Compression
    if i'm just playing with bits...c++ i do not like rather prefer pure c or assembler,,,btw, cmix v18 can compress to ~ 115 900 000 bytes with ssd swap file and 10gib of ram within 100 hours, the task is squeeze it up to 400k and code proper memory manager for cmix, i'm too lazy to take a part, my price begin with ...it depends on many factors, but may be someone want to win the hill;)
    39 replies | 1278 view(s)
  • LawCounsels's Avatar
    23rd February 2020, 19:18
    How easy with quantum computation ? Can it be prevented ?
    10 replies | 303 view(s)
  • Jarek's Avatar
    23rd February 2020, 18:59
    Jarek replied to a thread Hutter Prize update in Data Compression
    enwik10 as built of 10 languages does not only regard knowledge extraction, but also ability to find correspondence between languages, kind of automatic translation. It is an interesting question if/which compressors can do it. It can be tested as in this "Hilberg conjecture" for finding long range dependencies. For example in http://www.byronknoll.com/cmix.html we can read that enwik9, 8, 6 are compressed into correspondingly: 115714367, 14838332, 176377 bytes. We see it is sublinear behavior - larger files can be better compressed thanks to exploiting long range dependencies. The question is we have something similar in this enwiki10: what is the size difference between compressing its 10 language files together and separately? Improvement for compressing together can be interpreted as kind of automatic translation ability (rather only for the best compressors). Ok, not exactly translation - they probably contain different texts, so it is rather ability to exploit similarities between different languages (their models).
    39 replies | 1278 view(s)
  • bwt's Avatar
    23rd February 2020, 18:20
    bwt replied to a thread Hutter Prize update in Data Compression
    Maybe it is more interesting to compress eneik10 with <48 hours like sportsman do before. I think it is more applicative rather than compress enwik9. It is so long to compress 1gb file until 4-5 days....:eek:
    39 replies | 1278 view(s)
  • bwt's Avatar
    23rd February 2020, 18:12
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    @darek have you saved the paq8pxd48_bwt1 source code ?my HDD has crashed a long time ago so I don't have the source copy
    35 replies | 2501 view(s)
  • Jarek's Avatar
    23rd February 2020, 17:58
    Jarek replied to a thread Hutter Prize update in Data Compression
    It got some interest in https://old.reddit.com/r/MachineLearning/comments/f7z5sa/news_500000_prize_for_distilling_wikipedia_to_its/ https://news.slashdot.org/story/20/02/22/0434243/hutter-prize-for-lossless-compression-of-human-knowledge-increased-to-500000 There are compressors based on these huge BERT, GPL etc. models, but it is a disqualifying requirement. Also such advanced ML methods are deadly for CPU - this "no GPU usage" can discourage some interested people.
    39 replies | 1278 view(s)
  • Shelwien's Avatar
    23rd February 2020, 16:39
    This one actually compiles, but has the same problem with TextModel::p() not returning a value.
    35 replies | 2501 view(s)
  • Shelwien's Avatar
    23rd February 2020, 16:20
    I want to try writing an "Anti-LZ" coder - one that would encode some bitstrings that don't appear in the data, then enumeration index of the data with known exclusions. As to Nelson's file though, its produced with a known process: https://www.rand.org/pubs/monograph_reports/MR1418/index2.html
    21 replies | 1023 view(s)
  • bwt's Avatar
    23rd February 2020, 16:19
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    how about this source , can it be compiled ? thank you
    35 replies | 2501 view(s)
  • Shelwien's Avatar
    23rd February 2020, 16:05
    Maybe it was a different source version. This source is broken, it won't compile.
    35 replies | 2501 view(s)
  • Shelwien's Avatar
    23rd February 2020, 16:03
    Problem is that its basically the same contest, just with better prizes. New participants would still have to compete with all the time that Alex spent on tweaking and testing all kinds of things. So we can't expect sudden 10% breakthroughs here, at least not while using the same paq framework (cmix is also paq-based). Even 1% required for a prize would be not that easy to reach. > students who earn part-time 4 euro a hour Its very unlikely in this case. The potential winner has to be a C++ programmer (which is out of fashion) with a good knowledge of state-of-art compression methods, and with access to hardware for cmix testing.
    39 replies | 1278 view(s)
  • bwt's Avatar
    23rd February 2020, 15:55
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    could you compile using gcc 7.0 please ? because before you compile with gcc 7.0 is successful. thank you
    35 replies | 2501 view(s)
  • Kaw's Avatar
    23rd February 2020, 15:40
    50 bits of redundancy is not a very big deal if you talk about randomness. For compression you have basically 2 options: 1. On the fly statistics and trying to compress the data with partial knowledge. (Time serie prediction) 2. Use knowledge of the entire file, but then you have to include a description of this knowledge in the output file. If you look at time series prediction and randomness, it's really hard to find working patterns that will help you to compress the file. In a 100% random file 50% of the patterns will have a bias to 0 and 50% a bias to 1. Half of those patterns will end up having a bias the other way around at the end of the file. There might be a pattern finding 50 bits of redundancy, but which is it? And will it hold up for the second half of the file? If you are able to make an algorithm that finds that 50 bits redundancy in this file it will be very strong AI. If you look at prior knowledge: how do you describe this redundancy or bias within 50 bits? Also this would be a major improvement if you are able to describe fairly complex patterns within an amount of bits that's below the advantage you get from describing it.
    21 replies | 1023 view(s)
  • Sportman's Avatar
    23rd February 2020, 14:05
    Agree, but because you need to publish source code it can be strategic to submit the best possible version as first version and who knows how many % improvement that is. I think about outsiders, students who earn part-time 4 euro a hour in a restaurant or supermarket or have no income at all accept parents money or government/bank loan.
    39 replies | 1278 view(s)
  • Shelwien's Avatar
    23rd February 2020, 13:08
    I tried, but TextModel::p() seems to be missing the return statement, and there's a bracket mismatch somewhere. So it doesn't compile.
    35 replies | 2501 view(s)
  • Shelwien's Avatar
    23rd February 2020, 12:07
    You have to understand that both 50k and 500k values are just for advertisement. Realistically new people would have to invest months of work even to make $5k.
    39 replies | 1278 view(s)
  • bwt's Avatar
    23rd February 2020, 11:56
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    paq8pxdv48_bwt2 @shelwien could you compile this source code please ? maybe could you remove zlib n gif function so i can compile it my self please ? thank you
    35 replies | 2501 view(s)
  • Sportman's Avatar
    23rd February 2020, 11:51
    Is (Marcus) Hutter pize money source grants or his Google Senior Researcher DeepMind salary (maybe with employee stocks/options)? Grants: 2019 - 2023 A$ 7’500’000,- ANU Grand Challenge. 10 CIs. Human Machine Intelligence (HMI). 2019 - 2021 US$ 276’000,- Future of Life Project grant. Sole CI. The Control Problem for Universal AI: A Formal Investigation (CPUAI). 2015 - 2019 A$ 421’500,- Australian Research Council DP grant. Sole CI. Unifying Foundations for Intelligent Agents (UFIA).
    39 replies | 1278 view(s)
  • bwt's Avatar
    23rd February 2020, 11:37
    bwt replied to a thread Hutter Prize update in Data Compression
    ​:eek: ​
    39 replies | 1278 view(s)
  • Sportman's Avatar
    23rd February 2020, 11:30
    This CPU and motherboard only support 64GB max and CPU has 8 cores (16 threads) so in theory you can run 16 instances with 4GB memory each (64GB memory installed). For 64GB: Memory: Crucial Ballistix Sport LT 64GB (2666MHz) 285 euro. Total: 660 euro. For 128GB: Motherboard: ASRock X570M Pro4 (micro-ATX) 195 euro. CPU: AMD Ryzen 9 3900X (12 cores, 3.8-4.6GHz) 470 euro. Memory: HyperX Fury black 128MB (3200MHz) 675 euro. Total: 1515 euro.
    39 replies | 1278 view(s)
  • bwt's Avatar
    23rd February 2020, 05:56
    bwt replied to a thread Hutter Prize update in Data Compression
    looking at ltcb site, cmix v17 has beats phda9 but it use 25gb ram and more time. how about to reduce the variable setting ? is it still better than phda9 ??
    39 replies | 1278 view(s)
  • well's Avatar
    23rd February 2020, 05:39
    ok, i believe you, since i've vision who are you irl and it is best not to be beyond the pale in my searching of truth...thanks evgeniy and sportman for answering!
    7 replies | 1382 view(s)
  • Shelwien's Avatar
    23rd February 2020, 04:15
    @CompressMaster: I don't provide that kind of service. You can see mcm results here: http://mattmahoney.net/dc/text.html#1449 and download enwik9.pmd here: http://mattmahoney.net/dc/textdata.html
    39 replies | 1278 view(s)
  • Shelwien's Avatar
    23rd February 2020, 03:57
    > What about building your own PC Keep in mind that a single run takes 5-7 days... I suppose it would be better to install 256GB of RAM (so 490+345=835?), then run 8 instances at once. Actually for some tasks, like submodel memory usage tweaking, or submodel contribution estimation it would be better to run individual submodels, write their predictions to files, then do the final mix/SSE pass separately - it would require recomputing only the results of modified submodel. But unfortunately many other tasks - like testing new preprocessing ideas, or article reordering, or WRT dictionary optimization - would still require complete runs. Btw I actually did experiment with article reordering for enwik8... but I tried to speed-optimize it by compressing reordered files with ppmd instead of paq. Unfortunately after actual testing it turned out that article order that improves ppmd compression hurts it for paq.
    39 replies | 1278 view(s)
  • Shelwien's Avatar
    23rd February 2020, 03:35
    @Self_Recursive_Data: > Matt says "The task is now to compress (instead of decompress) enwik9 to a self extracting archive" > Why the change to compress it and not decompress? I think its an attempt to buy phda sources from Alex (and then to prevent anybody else monopolizing the contest), since they require compressor's sources now. It was a decompressor before, but with only a decompressor source it might be still hard to reproduce the result if algorithm is asymmetric (ie compressor does some data optimization, then encodes the results) which might be the case for phda. Also it looks to me that some zpaq ideas affected the new rules: I think Matt expects the compressor to generate a custom decompressor based on file analysis, that's probably why both compressor and decompressor size are counted as part of result. Not sure why he decided to not support the more common case where enc/dec are symmetric, maybe its an attempt to promote asymmetry? > Shouldn't the time measured measure the total time to compress+decompress? The time doesn't affect the results. Based on "Each program must run on 1 core in less than 100 hours" we can say that "compress+decompress" is allowed 200 hours. > Strong AI would cycle/recurse through finding and digesting new information, > then extracting new insights, repeat. > Kennon's algorithm for example compresses very slowly but extracts super fast. Sure, but 100 hours is more than 4 days. It should be enough time to do multiple passes or whatever, if necessary. Matt doesn't have a dedicated server farm for testing contest entries, so they can't really run for too long.
    39 replies | 1278 view(s)
  • Sportman's Avatar
    23rd February 2020, 02:25
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Looks like still not found, hope for Italy that it was a tourist and not somebody who still walk round in Italy. Italy passed Hong Kong: South Korea 433, Japan 134, Singapore 89, Italy 79, Hong Kong 70. Improve health as long there is time.
    8 replies | 204 view(s)
  • Sportman's Avatar
    23rd February 2020, 01:26
    That's the chicken egg problem, best is to start working on something and if you win ask to send the price (in multiple smaller transactions once a month) to somebody you trust (via bank wire transfer) this way you stay anonymous and ask the bank no questions.
    7 replies | 1382 view(s)
  • Sportman's Avatar
    23rd February 2020, 00:57
    What about building your own PC: Case: Cooler Master MasterBox Q300L (micro-ATX), 45 euro Power supply: Seasonic Focus Gold 450W (80 Plus Gold), 60 euro Motherboard: Gigabyte B450M DS3H (micro-ATX), 70 euro CPU: AMD Ryzen 7 1700 + cooler (8 cores, 3.0-3.7GHz), 130 euro Memory: Crucial Ballistix Sport LT 32GB (2666MHz), 115 euro Storage: Adata XPG SX6000 Lite 512GB (NVMe M.2), 70 euro Total:, 490 euro Costs excl. energy 1 year:, 41 euro p/m Costs excl. energy 2 years:, 20 euro p/m Costs excl. energy 3 years:, 14 euro p/m Grabbed the parts, did not check if they match all.
    39 replies | 1278 view(s)
  • Self_Recursive_Data's Avatar
    23rd February 2020, 00:50
    Can someone answer my post #9 above ?
    39 replies | 1278 view(s)
  • CompressMaster's Avatar
    23rd February 2020, 00:44
    @Shelwien, could I request you for upload enwik9 compressed alongside with decompressor (mcm 0.84 with options -x3 and -x10) as you did with enwik10 in MEGA cloud? As always, I´m stuck with low HDD space... Thank you very much.
    39 replies | 1278 view(s)
  • well's Avatar
    23rd February 2020, 00:33
    the problem is "they" do not want to talk about money but rather about only compression:confused: random data is not compressible:) i'm only interesting now in hutter prize, it seems to me rather profitable task:D and yes, i have very few knowledges about technics of compression, but i do not give this field too much my time, just basics... my goal is money, if this goal breaks spirit of compression competition then everything will stay in-place and in-time, just let me know!:_shuffle2:
    7 replies | 1382 view(s)
  • Sportman's Avatar
    23rd February 2020, 00:22
    Ask if they can pay you out in a top 10 crypto currency if you win, they can exchange dollars to crypto and send it to you. I had a random data contest with a price, but it was not claimed before end last year, but you can always send what you have to me to verify your claim. If you managed to create a working random data compressor you can indeed better stay very low profile.
    7 replies | 1382 view(s)
  • Shelwien's Avatar
    23rd February 2020, 00:05
    I actually can run 3-4 instances of cmix at home. I just wanted to point that running one costs considerable money, it'd be $5 per run on my home PC (which is probably much faster than hetzner VM) just in electricity bills. Just that some optimizations like what Alex suggests (optimizing article reordering etc) would really benefit from having access to 100s of free instances. Anyway, Intel devcloud would be likely a better choice atm, since they provide a few months of free trial... but I think they don't allow a single task to run for more than 24 hours, so it would be necessary to implement some kind of save/load feature. Btw https://encode.su/threads/3242-google-colab-compression-testing
    39 replies | 1278 view(s)
  • well's Avatar
    22nd February 2020, 23:59
    sorry, i cann't go personally to australia, but i can call by skype and drop e-mail, and already i did this, no answer by phone, no answer by e-mail...i also can get money and in my credit card but ukrainian bank system is too restricted...just treat me right, i'm living in political and social getto and just want to be sure that prize money i'll get and there will be no media hype and so on connected with my person...i wanna be in as cool shadow as it can...
    7 replies | 1382 view(s)
  • schnaader's Avatar
    22nd February 2020, 23:41
    Hetzner has better pricing, 0.006 €/h or 36 €/month (CX51, 32 GB RAM, 240 GB disk). There's a "dedicated vCPU" alternative for 83 €/month, but I couldn't see a big performance difference last time I tried. Apart from that, Byron's offer for Google credit might still be available.
    39 replies | 1278 view(s)
  • CompressMaster's Avatar
    22nd February 2020, 22:52
    so, enwik8 is now without any prize like enwik9 was prior? Oh, it´s *good* that I haven´t developed my custom compressor targeted to enwik8 so far... but I´m working on it!
    39 replies | 1278 view(s)
  • Shelwien's Avatar
    22nd February 2020, 22:21
    > The number in the yellow table on the front page is 115'518'496. Award = 500000*(L-S)/L, where S = new record (size of comp9.exe+archive9.exe+opt or alternative above), L = 116'673'681 Minimum award is 1% of 500000. 500000*(1-115506944/116673681) = 5000.00081 500000*(1-115518496/116673681) = 4950.49522 I guess Matt has a buggy calculator. > With the size of compressor, according to the new rules: > length(comp9.exe/zip)+length(archive9.exe) Yes, but its dumb and I hope it would be changed to only include the sfx size like before. Now it counts the decoder twice for no reason. But if its not changed, I guess the actual target is 115,100,000. > If this is true, then very likely the rules will be adjusted to reflect this. I don't think its a good idea to encourage this trick. Also I suspect that Matt simply won't be able to test entries with 100GB memory usage and 100+1000 hours runtime (its 45 days).
    39 replies | 1278 view(s)
  • Shelwien's Avatar
    22nd February 2020, 22:07
    > neither myself nor phda9 derivatives will compete this year Well, thanks for this statement - it might motivate some people to actually participate. > So as usual you seemingly believe that > (1) completely new approaches won't be competitive, and I have one such a "new approach" myself - I still believe that contrepl can be used for this, its just a matter of writing and testing a script for it (which is very time-consuming in enwik9 case). But its still not really profitable - with 10x prize I'd estimate it as $20 per hour. Also note that compressing enwik9 with cmix once on AWS (c5.4xlarge/linux/spot) would cost something like 0.374*150=$56... basically to even make a profit at all one needs to have access to lots of free computing resources. Otherwise, NNCP and durilica are still blocked by speed and memory limits, so its very hard for something totally unrelated to paq to win. And its not like enwik9 is a completely new target which nobody tried to compress before. > (2) no big improvement is possible within the established framework. Its quite possible - for example automated parameter tuning should have a significant effect. Also parsing optimization, speculative probability estimation (eg. we can compute a byte probability distribution with bitwise models, just need a way to undo their updates), context generation and right contexts, etc. There're lots of ideas really. But paq requires a lot of rather dumb refactoring work to become compatible with new features. And the developer would need a lot of computing resources for testing it. And I don't think that it would be fair for eg. me to tweak cmix and get a prize - while cmix actually has an author who invested a lot of time in its development. > I'm sure an accelerated version of cmix can win, because I also think that it can win at least the first time, if you don't participate. Its more about who has the computing resources to tweak its memory usage. > But in the decompressor you need probabilities from all the models > for decompressing every bit of the original data. Well, your original idea doesn't seem to be compatible with the rules, but it may be possible to use the "manual swapping" idea which I thought your phda used. In any case, to even start anything, we have to first get compression ratio and speed within limits.
    39 replies | 1278 view(s)
  • Alexander Rhatushnyak's Avatar
    22nd February 2020, 21:07
    The number in the yellow table on the front page is 115'518'496. With the size of compressor, according to the new rules: and the size of decompressor is included with length(archive9.exe). If this is true, then very likely the rules will be adjusted to reflect this.
    39 replies | 1278 view(s)
  • byronknoll's Avatar
    22nd February 2020, 20:12
    From my reading of the rules this would not be allowed. "Each program must run on 1 core in less than 100 hours on our test machines with 10GB RAM and 100GB free HD for temporary files. No GPU usage." "Each program" here I assume applies to both the compression and decompression program.
    39 replies | 1278 view(s)
  • Piotr Tarsa's Avatar
    22nd February 2020, 20:04
    What are the limits for decompressor then? Correctness has to be verified somehow.
    39 replies | 1278 view(s)
  • Alexander Rhatushnyak's Avatar
    22nd February 2020, 18:37
    So as usual you seemingly believe that (1) completely new approaches won't be competitive, and (2) no big improvement is possible within the established framework. I'm sure an accelerated version of cmix can win, because (1) Many people on this forum are able to accelerate cmix (and neither myself nor phda9 derivatives will compete this year) (2) Using the existing model mixing framework it's possible to create a compressor that will use 10 GB RAM, and less than 100 hours to compress, but decompressor will need either 100+ GB RAM, or 1000+ hours to decompress. Because in the compressor, each model (or a set of models) can provide probability of 1 for every bit of input independently and therefore may use all 10 GB of RAM, and the allowed HDD space encourages 4+ such models/sets: with a 32-bit floating point number per bit of input, less than 24 GB per model/set, assuming the transformed input is smaller than 0.75 GB. But in the decompressor you need probabilities from all the models for decompressing every bit of the original data. I guess this asymmetry has been discussed on this forum a few years ago. As you see in the 1st post, "The task is now to compress (instead of decompress) enwik9 to a self extracting archive in 100 hours" with 10 GB RAM allowed.
    39 replies | 1278 view(s)
  • Mauro Vezzosi's Avatar
    22nd February 2020, 17:31
    Yes, we are rapidly moving up "the ranking". We know who patient 1 is (never was in China!) but not patient 0. Today is a bad day, 10 countries in quarantine, 2 outbreaks in 2 non-bordering areas, in some zones we are fastly closing shops/companies/schools/football games/carnivals/...
    8 replies | 204 view(s)
  • bwt's Avatar
    22nd February 2020, 17:13
    bwt replied to a thread Hutter Prize update in Data Compression
    if i am not wrong there is marcio pais n mauro vessozi beside byron knoll that improve cmix compression ratio.
    39 replies | 1278 view(s)
  • schnaader's Avatar
    22nd February 2020, 16:33
    Since it is open source now and the rules allow multiple authors (when they agree on how to divide the prize money), I'd suggest teamwork. This would prevent one author sending in an entry and another applying his prepared transformations on it, and it has a higher chance to get some big improve on ratio. Also, enwik9 is a target big enough for multiple people to test optimizations. And it matches the encode.(r/s)u/paq/cmix spirit :_superman2: Of course it's a bit risky, too, because someone might steal the ideas and make his own entry out of it, but it's unlikely that he'll win that way as the original source of the ideas is known.
    39 replies | 1278 view(s)
  • bwt's Avatar
    22nd February 2020, 16:26
    bwt replied to a thread Hutter Prize update in Data Compression
    it is nice:_superman2:
    39 replies | 1278 view(s)
  • Sportman's Avatar
    22nd February 2020, 15:56
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Italy can be added, same grow rate (daily doubling) as South Korea 43 cases, 2 deaths at this moment. Don't touch face, wash hands regular with soap, because virus is air born in spaces avoid spaces with groups of people. Get face masks, eye protection glass, hand gloves, protection clothing, antibacterial cleaning stuff, drinks, food and everything else you need to stay home for months, prepare for home working.
    8 replies | 204 view(s)
  • Sportman's Avatar
    22nd February 2020, 13:54
    In theory everybody can win, there is almost 2 months time to create something (contest starts April 14, 2020). Spread the Hutter prize message/hostname so more people go to try something, the prize money is serious enough to spend some time on it.
    39 replies | 1278 view(s)
  • Self_Recursive_Data's Avatar
    22nd February 2020, 13:44
    Matt says "The task is now to compress (instead of decompress) enwik9 to a self extracting archive" Why the change to compress it and not decompress? Shouldn't the time measured measure the total time to compress+decompress? Strong AI would cycle/recurse through finding and digesting new information, then extracting new insights, repeat. Kennon's algorithm for example compresses very slowly but extracts super fast.
    39 replies | 1278 view(s)
  • Darek's Avatar
    22nd February 2020, 12:55
    Darek replied to a thread Hutter Prize update in Data Compression
    paq8pxd reaches 125-126'000'000 bytes now at -s15 option (32GB), but there no big difference for -s11 or -s12 options which consumes about 9-10GB. Also test time is about 18-32h depend on preprocessing. From other side if we are talking about preprocessing -> for my tests split of enwik9 file to 4 parts and merge it in "1423" order gives about 100-200kB of gain. Resplitting batch have about 8KB.
    39 replies | 1278 view(s)
  • Darek's Avatar
    22nd February 2020, 12:46
    Darek replied to a thread Paq8pxd dict in Data Compression
    enwik9 score for non DRT version and -s15: 126'211'491 - enwik9_1423 -s15 by Paq8pxd_v74_AVX2 - record for a paq8pxd series (OK, except bwt1 but it's close)! Time 67'980,66s.
    707 replies | 282278 view(s)
  • dnd's Avatar
    22nd February 2020, 12:28
    "Accelerating Compression with FPGAs" In this article, we’ll discuss the Intel GZIP example design, implemented with oneAPI, and how it can help make FPGAs more accessible see other Data Compression Tweets
    81 replies | 12193 view(s)
  • Shelwien's Avatar
    22nd February 2020, 11:45
    So, the target compressed size of enwik9 to get the prize is ~115,300,000 (115,506,944 with decoder). Size-wise, it may be barely reachable for cmix (v18 result is 115,714,367), but even that v18 needs 3x allowed memory and 2x time (we can improve compression using public preprocessing scripts and save some memory and time by discarding non-text models, but 3x difference in memory size is too big). So once again either Alex wins first prize, then other people can start doing something (since open source is required), or contest keeps being stuck as before.
    39 replies | 1278 view(s)
More Activity