Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Sportman's Avatar
    Today, 04:03
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Italy passed Singapore and Japan (patient zero still not found): South Korea 602, Italy 157, Japan 146, Singapore 89, Hong Kong 74. Real CN, IR and HK counts are probably 10-60 times higher then reported. Serious or critical: 20-22% (each hospital can handle only a limited amount of cases) Deaths: 2-15% (flu 0.1-0.2%) (Finger) puls oximeter round 15-20 euro for early detection.
    7 replies | 178 view(s)
  • Shelwien's Avatar
    Today, 03:05
    You're talking about this: https://en.wikipedia.org/wiki/Chosen-plaintext_attack With billions of samples it should be possible to reverse-engineer normal compression algorithms, but its a known case in cryptography, so adding encryption after compression would still beat this type of attacks.
    6 replies | 160 view(s)
  • LawCounsels's Avatar
    Today, 02:52
    How many input and compressed file sets would you need to be able reverse engineer ? You may also want to design your special input files set !
    6 replies | 160 view(s)
  • Shelwien's Avatar
    Today, 02:32
    > it has 0-bits of advantages over ordinary bruteforce Well, the main idea of quantum computing is to use elementary particles and laws of physics for computing. But even normal electronics have some non-zero probability of errors, and require more and more compensation is circuit design (parity checks, ECC etc). And quantum logic has tens of percents of error probability per operation, so it was necessary to invent a whole new type of algorithms as a workaround. Still, there's a potential for higher density and parallelism than what we can have with further evolution of semiconductor electronics.
    6 replies | 160 view(s)
  • well's Avatar
    Today, 01:16
    quantum computing is term for selling more oil and gas as usually;) ordinary computer has one pseudo random number generator(prng) quantum computer has n-th true random number generators, where n is quantity of cubits... in gaussian distribution it has 0-bits of advantages over ordinary bruteforce...but in human associative style of actions it may be useful may be not:p it is nice to play cossacks with 8 000 units acting independently, that's all for what quantum computing is invented:D for your task in common case set of functions mapping one file to smaller file through instruction sets is countable but very big, that's why this task likes chess game is unsolved for a long time may be till the end of humanity for ia-32 and amd64!i'm sorry can not solved in one defined way but many algorithms and many programs you can get as hex-ray c-style decompiler:rolleyes:
    6 replies | 160 view(s)
  • Shelwien's Avatar
    Today, 00:32
    1) It may be possible to guess a known compression method from compressed data. There're even "recompression" programs (precomp etc) which make use of this to undo existing compression and apply something better. But even simple detection can be pretty hard (for example, lzham streams are encoded for specific window-size value - basically its necessary to try multiple codec versions and all possible window-size-log values to attempt decoding). 2) For some data samples (where the structure is easy to understand and predict from small substrings of data) and static bitcode algorithms it may be possible to reverse-engineer an unknown compression method - though even this requires luck and a lot of work. 3) Universal automatic solution to this task should be equivalent to Kolmogorov compression, and also breaking all cryptography etc. According to my estimations in some previous related threads here (based on Planck length and time), even the impossibly perfect quantum computers would just add ~150 bits to the key size that can be realistically bruteforced. So I'd say that a _new_ adaptive AC-based compression method can't be reverse-engineered from the data.
    6 replies | 160 view(s)
  • Sportman's Avatar
    Yesterday, 22:19
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    No comment. https://www.youtube.com/watch?v=F_TPjbu4FAE
    7 replies | 178 view(s)
  • Bulat Ziganshin's Avatar
    Yesterday, 22:19
    it's so easy that noone bothers to implement that. you may be first
    6 replies | 160 view(s)
  • Matt Mahoney's Avatar
    Yesterday, 21:29
    The new contest was entirely Marcus Hutter's idea. His money, his rules, although we reviewed them before the announcement. To me, the compressor is irrelevant to Kolmogorov complexity, but I think it does give an interesting twist to the contest. It will be interesting to see how you could take advantage of the shared code between the compressor and decompressor.
    39 replies | 1188 view(s)
  • well's Avatar
    Yesterday, 19:23
    well replied to a thread Hutter Prize update in Data Compression
    if i'm just playing with bits...c++ i do not like rather prefer pure c or assembler,,,btw, cmix v18 can compress to ~ 115 900 000 bytes with ssd swap file and 10gib of ram within 100 hours, the task is squeeze it up to 400k and code proper memory manager for cmix, i'm too lazy to take a part, my price begin with ...it depends on many factors, but may be someone want to win the hill;)
    39 replies | 1188 view(s)
  • LawCounsels's Avatar
    Yesterday, 19:18
    How easy with quantum computation ? Can it be prevented ?
    6 replies | 160 view(s)
  • Jarek's Avatar
    Yesterday, 18:59
    Jarek replied to a thread Hutter Prize update in Data Compression
    enwik10 as built of 10 languages does not only regard knowledge extraction, but also ability to find correspondence between languages, kind of automatic translation. It is an interesting question if/which compressors can do it. It can be tested as in this "Hilberg conjecture" for finding long range dependencies. For example in http://www.byronknoll.com/cmix.html we can read that enwik9, 8, 6 are compressed into correspondingly: 115714367, 14838332, 176377 bytes. We see it is sublinear behavior - larger files can be better compressed thanks to exploiting long range dependencies. The question is we have something similar in this enwiki10: what is the size difference between compressing its 10 language files together and separately? Improvement for compressing together can be interpreted as kind of automatic translation ability (rather only for the best compressors). Ok, not exactly translation - they probably contain different texts, so it is rather ability to exploit similarities between different languages (their models).
    39 replies | 1188 view(s)
  • bwt's Avatar
    Yesterday, 18:20
    bwt replied to a thread Hutter Prize update in Data Compression
    Maybe it is more interesting to compress eneik10 with <48 hours like sportsman do before. I think it is more applicative rather than compress enwik9. It is so long to compress 1gb file until 4-5 days....:eek:
    39 replies | 1188 view(s)
  • bwt's Avatar
    Yesterday, 18:12
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    @darek have you saved the paq8pxd48_bwt1 source code ?my HDD has crashed a long time ago so I don't have the source copy
    35 replies | 2464 view(s)
  • Jarek's Avatar
    Yesterday, 17:58
    Jarek replied to a thread Hutter Prize update in Data Compression
    It got some interest in https://old.reddit.com/r/MachineLearning/comments/f7z5sa/news_500000_prize_for_distilling_wikipedia_to_its/ https://news.slashdot.org/story/20/02/22/0434243/hutter-prize-for-lossless-compression-of-human-knowledge-increased-to-500000 There are compressors based on these huge BERT, GPL etc. models, but it is a disqualifying requirement. Also such advanced ML methods are deadly for CPU - this "no GPU usage" can discourage some interested people.
    39 replies | 1188 view(s)
  • Shelwien's Avatar
    Yesterday, 16:39
    This one actually compiles, but has the same problem with TextModel::p() not returning a value.
    35 replies | 2464 view(s)
  • Shelwien's Avatar
    Yesterday, 16:20
    I want to try writing an "Anti-LZ" coder - one that would encode some bitstrings that don't appear in the data, then enumeration index of the data with known exclusions. As to Nelson's file though, its produced with a known process: https://www.rand.org/pubs/monograph_reports/MR1418/index2.html
    21 replies | 1006 view(s)
  • bwt's Avatar
    Yesterday, 16:19
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    how about this source , can it be compiled ? thank you
    35 replies | 2464 view(s)
  • Shelwien's Avatar
    Yesterday, 16:05
    Maybe it was a different source version. This source is broken, it won't compile.
    35 replies | 2464 view(s)
  • Shelwien's Avatar
    Yesterday, 16:03
    Problem is that its basically the same contest, just with better prizes. New participants would still have to compete with all the time that Alex spent on tweaking and testing all kinds of things. So we can't expect sudden 10% breakthroughs here, at least not while using the same paq framework (cmix is also paq-based). Even 1% required for a prize would be not that easy to reach. > students who earn part-time 4 euro a hour Its very unlikely in this case. The potential winner has to be a C++ programmer (which is out of fashion) with a good knowledge of state-of-art compression methods, and with access to hardware for cmix testing.
    39 replies | 1188 view(s)
  • bwt's Avatar
    Yesterday, 15:55
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    could you compile using gcc 7.0 please ? because before you compile with gcc 7.0 is successful. thank you
    35 replies | 2464 view(s)
  • Kaw's Avatar
    Yesterday, 15:40
    50 bits of redundancy is not a very big deal if you talk about randomness. For compression you have basically 2 options: 1. On the fly statistics and trying to compress the data with partial knowledge. (Time serie prediction) 2. Use knowledge of the entire file, but then you have to include a description of this knowledge in the output file. If you look at time series prediction and randomness, it's really hard to find working patterns that will help you to compress the file. In a 100% random file 50% of the patterns will have a bias to 0 and 50% a bias to 1. Half of those patterns will end up having a bias the other way around at the end of the file. There might be a pattern finding 50 bits of redundancy, but which is it? And will it hold up for the second half of the file? If you are able to make an algorithm that finds that 50 bits redundancy in this file it will be very strong AI. If you look at prior knowledge: how do you describe this redundancy or bias within 50 bits? Also this would be a major improvement if you are able to describe fairly complex patterns within an amount of bits that's below the advantage you get from describing it.
    21 replies | 1006 view(s)
  • Sportman's Avatar
    Yesterday, 14:05
    Agree, but because you need to publish source code it can be strategic to submit the best possible version as first version and who knows how many % improvement that is. I think about outsiders, students who earn part-time 4 euro a hour in a restaurant or supermarket or have no income at all accept parents money or government/bank loan.
    39 replies | 1188 view(s)
  • Shelwien's Avatar
    Yesterday, 13:08
    I tried, but TextModel::p() seems to be missing the return statement, and there's a bracket mismatch somewhere. So it doesn't compile.
    35 replies | 2464 view(s)
  • Shelwien's Avatar
    Yesterday, 12:07
    You have to understand that both 50k and 500k values are just for advertisement. Realistically new people would have to invest months of work even to make $5k.
    39 replies | 1188 view(s)
  • bwt's Avatar
    Yesterday, 11:56
    bwt replied to a thread paq8lab 1.0 archiver in Data Compression
    paq8pxdv48_bwt2 @shelwien could you compile this source code please ? maybe could you remove zlib n gif function so i can compile it my self please ? thank you
    35 replies | 2464 view(s)
  • Sportman's Avatar
    Yesterday, 11:51
    Is (Marcus) Hutter pize money source grants or his Google Senior Researcher DeepMind salary (maybe with employee stocks/options)? Grants: 2019 - 2023 A$ 7’500’000,- ANU Grand Challenge. 10 CIs. Human Machine Intelligence (HMI). 2019 - 2021 US$ 276’000,- Future of Life Project grant. Sole CI. The Control Problem for Universal AI: A Formal Investigation (CPUAI). 2015 - 2019 A$ 421’500,- Australian Research Council DP grant. Sole CI. Unifying Foundations for Intelligent Agents (UFIA).
    39 replies | 1188 view(s)
  • bwt's Avatar
    Yesterday, 11:37
    bwt replied to a thread Hutter Prize update in Data Compression
    ​:eek: ​
    39 replies | 1188 view(s)
  • Sportman's Avatar
    Yesterday, 11:30
    This CPU and motherboard only support 64GB max and CPU has 8 cores (16 threads) so in theory you can run 16 instances with 4GB memory each (64GB memory installed). For 64GB: Memory: Crucial Ballistix Sport LT 64GB (2666MHz) 285 euro. Total: 660 euro. For 128GB: Motherboard: ASRock X570M Pro4 (micro-ATX) 195 euro. CPU: AMD Ryzen 9 3900X (12 cores, 3.8-4.6GHz) 470 euro. Memory: HyperX Fury black 128MB (3200MHz) 675 euro. Total: 1515 euro.
    39 replies | 1188 view(s)
  • bwt's Avatar
    Yesterday, 05:56
    bwt replied to a thread Hutter Prize update in Data Compression
    looking at ltcb site, cmix v17 has beats phda9 but it use 25gb ram and more time. how about to reduce the variable setting ? is it still better than phda9 ??
    39 replies | 1188 view(s)
  • well's Avatar
    Yesterday, 05:39
    ok, i believe you, since i've vision who are you irl and it is best not to be beyond the pale in my searching of truth...thanks evgeniy and sportman for answering!
    7 replies | 1374 view(s)
  • Shelwien's Avatar
    Yesterday, 04:15
    @CompressMaster: I don't provide that kind of service. You can see mcm results here: http://mattmahoney.net/dc/text.html#1449 and download enwik9.pmd here: http://mattmahoney.net/dc/textdata.html
    39 replies | 1188 view(s)
  • Shelwien's Avatar
    Yesterday, 03:57
    > What about building your own PC Keep in mind that a single run takes 5-7 days... I suppose it would be better to install 256GB of RAM (so 490+345=835?), then run 8 instances at once. Actually for some tasks, like submodel memory usage tweaking, or submodel contribution estimation it would be better to run individual submodels, write their predictions to files, then do the final mix/SSE pass separately - it would require recomputing only the results of modified submodel. But unfortunately many other tasks - like testing new preprocessing ideas, or article reordering, or WRT dictionary optimization - would still require complete runs. Btw I actually did experiment with article reordering for enwik8... but I tried to speed-optimize it by compressing reordered files with ppmd instead of paq. Unfortunately after actual testing it turned out that article order that improves ppmd compression hurts it for paq.
    39 replies | 1188 view(s)
  • Shelwien's Avatar
    Yesterday, 03:35
    @Self_Recursive_Data: > Matt says "The task is now to compress (instead of decompress) enwik9 to a self extracting archive" > Why the change to compress it and not decompress? I think its an attempt to buy phda sources from Alex (and then to prevent anybody else monopolizing the contest), since they require compressor's sources now. It was a decompressor before, but with only a decompressor source it might be still hard to reproduce the result if algorithm is asymmetric (ie compressor does some data optimization, then encodes the results) which might be the case for phda. Also it looks to me that some zpaq ideas affected the new rules: I think Matt expects the compressor to generate a custom decompressor based on file analysis, that's probably why both compressor and decompressor size are counted as part of result. Not sure why he decided to not support the more common case where enc/dec are symmetric, maybe its an attempt to promote asymmetry? > Shouldn't the time measured measure the total time to compress+decompress? The time doesn't affect the results. Based on "Each program must run on 1 core in less than 100 hours" we can say that "compress+decompress" is allowed 200 hours. > Strong AI would cycle/recurse through finding and digesting new information, > then extracting new insights, repeat. > Kennon's algorithm for example compresses very slowly but extracts super fast. Sure, but 100 hours is more than 4 days. It should be enough time to do multiple passes or whatever, if necessary. Matt doesn't have a dedicated server farm for testing contest entries, so they can't really run for too long.
    39 replies | 1188 view(s)
  • Sportman's Avatar
    Yesterday, 02:25
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Looks like still not found, hope for Italy that it was a tourist and not somebody who still walk round in Italy. Italy passed Hong Kong: South Korea 433, Japan 134, Singapore 89, Italy 79, Hong Kong 70. Improve health as long there is time.
    7 replies | 178 view(s)
  • Sportman's Avatar
    Yesterday, 01:26
    That's the chicken egg problem, best is to start working on something and if you win ask to send the price (in multiple smaller transactions once a month) to somebody you trust (via bank wire transfer) this way you stay anonymous and ask the bank no questions.
    7 replies | 1374 view(s)
  • Sportman's Avatar
    Yesterday, 00:57
    What about building your own PC: Case: Cooler Master MasterBox Q300L (micro-ATX), 45 euro Power supply: Seasonic Focus Gold 450W (80 Plus Gold), 60 euro Motherboard: Gigabyte B450M DS3H (micro-ATX), 70 euro CPU: AMD Ryzen 7 1700 + cooler (8 cores, 3.0-3.7GHz), 130 euro Memory: Crucial Ballistix Sport LT 32GB (2666MHz), 115 euro Storage: Adata XPG SX6000 Lite 512GB (NVMe M.2), 70 euro Total:, 490 euro Costs excl. energy 1 year:, 41 euro p/m Costs excl. energy 2 years:, 20 euro p/m Costs excl. energy 3 years:, 14 euro p/m Grabbed the parts, did not check if they match all.
    39 replies | 1188 view(s)
  • Self_Recursive_Data's Avatar
    Yesterday, 00:50
    Can someone answer my post #9 above ?
    39 replies | 1188 view(s)
  • CompressMaster's Avatar
    Yesterday, 00:44
    @Shelwien, could I request you for upload enwik9 compressed alongside with decompressor (mcm 0.84 with options -x3 and -x10) as you did with enwik10 in MEGA cloud? As always, I´m stuck with low HDD space... Thank you very much.
    39 replies | 1188 view(s)
  • well's Avatar
    Yesterday, 00:33
    the problem is "they" do not want to talk about money but rather about only compression:confused: random data is not compressible:) i'm only interesting now in hutter prize, it seems to me rather profitable task:D and yes, i have very few knowledges about technics of compression, but i do not give this field too much my time, just basics... my goal is money, if this goal breaks spirit of compression competition then everything will stay in-place and in-time, just let me know!:_shuffle2:
    7 replies | 1374 view(s)
  • Sportman's Avatar
    Yesterday, 00:22
    Ask if they can pay you out in a top 10 crypto currency if you win, they can exchange dollars to crypto and send it to you. I had a random data contest with a price, but it was not claimed before end last year, but you can always send what you have to me to verify your claim. If you managed to create a working random data compressor you can indeed better stay very low profile.
    7 replies | 1374 view(s)
  • Shelwien's Avatar
    Yesterday, 00:05
    I actually can run 3-4 instances of cmix at home. I just wanted to point that running one costs considerable money, it'd be $5 per run on my home PC (which is probably much faster than hetzner VM) just in electricity bills. Just that some optimizations like what Alex suggests (optimizing article reordering etc) would really benefit from having access to 100s of free instances. Anyway, Intel devcloud would be likely a better choice atm, since they provide a few months of free trial... but I think they don't allow a single task to run for more than 24 hours, so it would be necessary to implement some kind of save/load feature. Btw https://encode.su/threads/3242-google-colab-compression-testing
    39 replies | 1188 view(s)
  • well's Avatar
    22nd February 2020, 23:59
    sorry, i cann't go personally to australia, but i can call by skype and drop e-mail, and already i did this, no answer by phone, no answer by e-mail...i also can get money and in my credit card but ukrainian bank system is too restricted...just treat me right, i'm living in political and social getto and just want to be sure that prize money i'll get and there will be no media hype and so on connected with my person...i wanna be in as cool shadow as it can...
    7 replies | 1374 view(s)
  • schnaader's Avatar
    22nd February 2020, 23:41
    Hetzner has better pricing, 0.006 €/h or 36 €/month (CX51, 32 GB RAM, 240 GB disk). There's a "dedicated vCPU" alternative for 83 €/month, but I couldn't see a big performance difference last time I tried. Apart from that, Byron's offer for Google credit might still be available.
    39 replies | 1188 view(s)
  • CompressMaster's Avatar
    22nd February 2020, 22:52
    so, enwik8 is now without any prize like enwik9 was prior? Oh, it´s *good* that I haven´t developed my custom compressor targeted to enwik8 so far... but I´m working on it!
    39 replies | 1188 view(s)
  • Shelwien's Avatar
    22nd February 2020, 22:21
    > The number in the yellow table on the front page is 115'518'496. Award = 500000*(L-S)/L, where S = new record (size of comp9.exe+archive9.exe+opt or alternative above), L = 116'673'681 Minimum award is 1% of 500000. 500000*(1-115506944/116673681) = 5000.00081 500000*(1-115518496/116673681) = 4950.49522 I guess Matt has a buggy calculator. > With the size of compressor, according to the new rules: > length(comp9.exe/zip)+length(archive9.exe) Yes, but its dumb and I hope it would be changed to only include the sfx size like before. Now it counts the decoder twice for no reason. But if its not changed, I guess the actual target is 115,100,000. > If this is true, then very likely the rules will be adjusted to reflect this. I don't think its a good idea to encourage this trick. Also I suspect that Matt simply won't be able to test entries with 100GB memory usage and 100+1000 hours runtime (its 45 days).
    39 replies | 1188 view(s)
  • Shelwien's Avatar
    22nd February 2020, 22:07
    > neither myself nor phda9 derivatives will compete this year Well, thanks for this statement - it might motivate some people to actually participate. > So as usual you seemingly believe that > (1) completely new approaches won't be competitive, and I have one such a "new approach" myself - I still believe that contrepl can be used for this, its just a matter of writing and testing a script for it (which is very time-consuming in enwik9 case). But its still not really profitable - with 10x prize I'd estimate it as $20 per hour. Also note that compressing enwik9 with cmix once on AWS (c5.4xlarge/linux/spot) would cost something like 0.374*150=$56... basically to even make a profit at all one needs to have access to lots of free computing resources. Otherwise, NNCP and durilica are still blocked by speed and memory limits, so its very hard for something totally unrelated to paq to win. And its not like enwik9 is a completely new target which nobody tried to compress before. > (2) no big improvement is possible within the established framework. Its quite possible - for example automated parameter tuning should have a significant effect. Also parsing optimization, speculative probability estimation (eg. we can compute a byte probability distribution with bitwise models, just need a way to undo their updates), context generation and right contexts, etc. There're lots of ideas really. But paq requires a lot of rather dumb refactoring work to become compatible with new features. And the developer would need a lot of computing resources for testing it. And I don't think that it would be fair for eg. me to tweak cmix and get a prize - while cmix actually has an author who invested a lot of time in its development. > I'm sure an accelerated version of cmix can win, because I also think that it can win at least the first time, if you don't participate. Its more about who has the computing resources to tweak its memory usage. > But in the decompressor you need probabilities from all the models > for decompressing every bit of the original data. Well, your original idea doesn't seem to be compatible with the rules, but it may be possible to use the "manual swapping" idea which I thought your phda used. In any case, to even start anything, we have to first get compression ratio and speed within limits.
    39 replies | 1188 view(s)
  • Alexander Rhatushnyak's Avatar
    22nd February 2020, 21:07
    The number in the yellow table on the front page is 115'518'496. With the size of compressor, according to the new rules: and the size of decompressor is included with length(archive9.exe). If this is true, then very likely the rules will be adjusted to reflect this.
    39 replies | 1188 view(s)
  • byronknoll's Avatar
    22nd February 2020, 20:12
    From my reading of the rules this would not be allowed. "Each program must run on 1 core in less than 100 hours on our test machines with 10GB RAM and 100GB free HD for temporary files. No GPU usage." "Each program" here I assume applies to both the compression and decompression program.
    39 replies | 1188 view(s)
  • Piotr Tarsa's Avatar
    22nd February 2020, 20:04
    What are the limits for decompressor then? Correctness has to be verified somehow.
    39 replies | 1188 view(s)
  • Alexander Rhatushnyak's Avatar
    22nd February 2020, 18:37
    So as usual you seemingly believe that (1) completely new approaches won't be competitive, and (2) no big improvement is possible within the established framework. I'm sure an accelerated version of cmix can win, because (1) Many people on this forum are able to accelerate cmix (and neither myself nor phda9 derivatives will compete this year) (2) Using the existing model mixing framework it's possible to create a compressor that will use 10 GB RAM, and less than 100 hours to compress, but decompressor will need either 100+ GB RAM, or 1000+ hours to decompress. Because in the compressor, each model (or a set of models) can provide probability of 1 for every bit of input independently and therefore may use all 10 GB of RAM, and the allowed HDD space encourages 4+ such models/sets: with a 32-bit floating point number per bit of input, less than 24 GB per model/set, assuming the transformed input is smaller than 0.75 GB. But in the decompressor you need probabilities from all the models for decompressing every bit of the original data. I guess this asymmetry has been discussed on this forum a few years ago. As you see in the 1st post, "The task is now to compress (instead of decompress) enwik9 to a self extracting archive in 100 hours" with 10 GB RAM allowed.
    39 replies | 1188 view(s)
  • Mauro Vezzosi's Avatar
    22nd February 2020, 17:31
    Yes, we are rapidly moving up "the ranking". We know who patient 1 is (never was in China!) but not patient 0. Today is a bad day, 10 countries in quarantine, 2 outbreaks in 2 non-bordering areas, in some zones we are fastly closing shops/companies/schools/football games/carnivals/...
    7 replies | 178 view(s)
  • bwt's Avatar
    22nd February 2020, 17:13
    bwt replied to a thread Hutter Prize update in Data Compression
    if i am not wrong there is marcio pais n mauro vessozi beside byron knoll that improve cmix compression ratio.
    39 replies | 1188 view(s)
  • schnaader's Avatar
    22nd February 2020, 16:33
    Since it is open source now and the rules allow multiple authors (when they agree on how to divide the prize money), I'd suggest teamwork. This would prevent one author sending in an entry and another applying his prepared transformations on it, and it has a higher chance to get some big improve on ratio. Also, enwik9 is a target big enough for multiple people to test optimizations. And it matches the encode.(r/s)u/paq/cmix spirit :_superman2: Of course it's a bit risky, too, because someone might steal the ideas and make his own entry out of it, but it's unlikely that he'll win that way as the original source of the ideas is known.
    39 replies | 1188 view(s)
  • bwt's Avatar
    22nd February 2020, 16:26
    bwt replied to a thread Hutter Prize update in Data Compression
    it is nice:_superman2:
    39 replies | 1188 view(s)
  • Sportman's Avatar
    22nd February 2020, 15:56
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Italy can be added, same grow rate (daily doubling) as South Korea 43 cases, 2 deaths at this moment. Don't touch face, wash hands regular with soap, because virus is air born in spaces avoid spaces with groups of people. Get face masks, eye protection glass, hand gloves, protection clothing, antibacterial cleaning stuff, drinks, food and everything else you need to stay home for months, prepare for home working.
    7 replies | 178 view(s)
  • Sportman's Avatar
    22nd February 2020, 13:54
    In theory everybody can win, there is almost 2 months time to create something (contest starts April 14, 2020). Spread the Hutter prize message/hostname so more people go to try something, the prize money is serious enough to spend some time on it.
    39 replies | 1188 view(s)
  • Self_Recursive_Data's Avatar
    22nd February 2020, 13:44
    Matt says "The task is now to compress (instead of decompress) enwik9 to a self extracting archive" Why the change to compress it and not decompress? Shouldn't the time measured measure the total time to compress+decompress? Strong AI would cycle/recurse through finding and digesting new information, then extracting new insights, repeat. Kennon's algorithm for example compresses very slowly but extracts super fast.
    39 replies | 1188 view(s)
  • Darek's Avatar
    22nd February 2020, 12:55
    Darek replied to a thread Hutter Prize update in Data Compression
    paq8pxd reaches 125-126'000'000 bytes now at -s15 option (32GB), but there no big difference for -s11 or -s12 options which consumes about 9-10GB. Also test time is about 18-32h depend on preprocessing. From other side if we are talking about preprocessing -> for my tests split of enwik9 file to 4 parts and merge it in "1423" order gives about 100-200kB of gain. Resplitting batch have about 8KB.
    39 replies | 1188 view(s)
  • Darek's Avatar
    22nd February 2020, 12:46
    Darek replied to a thread Paq8pxd dict in Data Compression
    enwik9 score for non DRT version and -s15: 126'211'491 - enwik9_1423 -s15 by Paq8pxd_v74_AVX2 - record for a paq8pxd series (OK, except bwt1 but it's close)! Time 67'980,66s.
    707 replies | 282252 view(s)
  • dnd's Avatar
    22nd February 2020, 12:28
    "Accelerating Compression with FPGAs" In this article, we’ll discuss the Intel GZIP example design, implemented with oneAPI, and how it can help make FPGAs more accessible see other Data Compression Tweets
    81 replies | 12181 view(s)
  • Shelwien's Avatar
    22nd February 2020, 11:45
    So, the target compressed size of enwik9 to get the prize is ~115,300,000 (115,506,944 with decoder). Size-wise, it may be barely reachable for cmix (v18 result is 115,714,367), but even that v18 needs 3x allowed memory and 2x time (we can improve compression using public preprocessing scripts and save some memory and time by discarding non-text models, but 3x difference in memory size is too big). So once again either Alex wins first prize, then other people can start doing something (since open source is required), or contest keeps being stuck as before.
    39 replies | 1188 view(s)
  • suryakandau@yahoo.co.id's Avatar
    22nd February 2020, 10:45
    Maybe it is more interesting to use enwik10 rather then enwik9...and the time <=48hours
    39 replies | 1188 view(s)
  • hexagone's Avatar
    22nd February 2020, 03:35
    Release 1.7 Changes: - Bug fixes & code cleanup - Slightly better compression throughout - Modified level 6 (faster for text files) - Better handling of small files Silesia C++ results: https://github.com/flanglet/kanzi-cpp Silesia Java results: https://github.com/flanglet/kanzi Silesia Go results: https://github.com/flanglet/kanzi-go enwik8 zip 3.0 -9 4.70 0.59 36445403 lzfse 1.0 4.66 0.82 36157828 kanzi -b 25m -l 1 -j 4 0.54 0.39 34532276 lrzip 0.631 -b -p 12 3.91 1.29 29122579 bzip2 1.0.6 -9 5.84 2.52 29008758 brotli 1.0.5 -9 64.68 0.84 28879185 kanzi -b 25m -l 2 -j 4 0.72 0.48 27962342 lrzip 0.631 -p 12 11.36 0.96 27228013 orz 1.5.0 4.71 0.95 27148974 zstd 1.4.5 -19 39.71 0.18 26960372 kanzi -b 12500k -l 3 -j 8 1.07 0.64 26741570 brotli 1.0.5 -Z 430.95 0.73 25742001 lzham 0x1010 -m4 20.35 0.50 25066677 kanzi -b 12500k -l 4 -j 8 1.29 0.76 24989286 lzma 5.2.2 -9 54.75 1.00 24861357 brotli 1.0.7 --large_window=30 435.10 0.95 24810180 lzturbo 1.2 -49 -b100 82.19 1.24 24356021 kanzi -b 25m -l 4 -j 8 1.59 0.94 24108751 kanzi -b 100m -l 4 -j 8 5.52 1.89 22478636 lrzip 0.631 -z -p 12 18.08 15.18 22197072 kanzi -b 100m -l 5 -j 8 7.93 3.31 21275446 bsc -b100 5.51 1.33 20920018 kanzi -b 100m -l 6 -j 8 9.98 5.78 20869366 kanzi -b 100m -l 7 18.98 18.81 19570938 kanzi -b 100m -l 8 27.18 27.73 19141858 xwrt 3.2 -b100 -l14 51.39 53.37 18721755 calgary 1.6 Level 2 Total encoding time: 91 ms Total output size: 1077662 bytes 1.7 Level 2 Total encoding time: 66 ms Total output size: 1012784 bytes 1.6 Level 7 Total encoding time: 1991 ms Total output size: 744184 bytes 1.7 Level 7 Total encoding time: 808 ms Total output size: 739624 bytes ​ 1.6 Level 8 Total encoding time: 3849 ms Total output size: 735236 bytes 1.7 Level 8 Total encoding time: 1382 ms Total output size: 733188 bytes
    19 replies | 6260 view(s)
  • Trench's Avatar
    22nd February 2020, 03:08
    Imagine if the program took an hour to scan for what pattern to use then compress it. Would that be worth it? it would make the file just as "random" but at least to try and be more "comprehensible" to be compression friendly. nice effort but maybe better to learn some other ideas by experimenting in small scale than go big to see results. The skill people have is good here to be able to make such programs but do not have infinite perspectives which one has which takes exploration. Kind of how people use to go on trips was to experience new thing to learn and bring back home to use it and make your home better. Ancient Greeks went around gather information and put it in books for that other add on to ideas until something with a certain perspective can add up the things to make something new and that new thing inspires others when they gather other things and other better things happen. But now modern time to just visit see what you can and bring nothing back but selfies to say they were their which is only important to them and their circle and no one else. it would be like going to the supermarket to just look just for the sake of looking which has no purpose. Anyone take a selfie in a supermarket? LOL Obviously i am giving a silly example but you get the idea. Some people like wasting hours leveling up in a game to erase all progress and forget the game in a few months while others have fun figuring out math problems that helps themselves and/or everyone a lifetime. And some ideas go nowhere which is an example for others to not try their... if done right since some dead ends are false dead ends. replacing random with other random does not help but make it more meaningful if you are going to do it. Have the code find something that can become a pattern with the help of a pattern. I dont mean scan the entire file but randomly pick some spots in the file to see if they have similarities to input a pattern or 2 to make it be smaller not bigger. Again that can be a dead end but to say you have a loss makes it sound like you are doing something way off. best if the program scans which pattern is best to get a better result than to just do without scanning. Imagine if the program took an hour to scan for what pattern to use then compress it. Would that be worth it? it would make the file just as "random" but at least to try and be more "comprehensible" to be compression friendly. just like the random number presented before instead of finding a solution to find how many way it can be broken down. 128705610 LLHHLHHLL High Low the center 0 is off to need another L OEEOEOEOE Odds Evens the 2 after would have needed a O GGGLLGGLL greater lower than previous which 1st ones off so based off that then 228815621 pattern to add 100110011 based off average of the 3 previous patterns but patterns can be infinite which will take time. If it be at small chunks of the file to have 2 random numbers and put it in blocks which would be a bigger file or the single pattern though out. in a way it is a math problem which the would need a lot of processing. Or maybe a dead end. Again I am not saying it will work in practice but in theory. Just like the Irobot vacuum how eventually within time it will clean the home. Another odd thing is that it can be subtracted by some of #2 doubling of the numbers as the chart shows The left is what is misused first. which strangely enough if you put it a 1 next to an active number and zero next to the non active row you get 111101010111110010001001010 which is exactly the same number 128705610. if it was dividable by 3 it would be 101000111010100001011011100 with works 2/3 of the times. and converting the binary number would change to be 85803740. Other numbers like 6 it would work and is 1 digit less to be 10100011101010000101101110 which if you convert that number to decimal it is 42901870 again 1 digit less. But by 9 1101101000110101110011111 which needs a -2 to make it work since the last binary digit remains the same if 7 number off. I figure that means nothing but throwing it out their. by 2 to find 128705610 67108864 1 1 33554432 2 1 16777216 3 1 8388608 4 1 4194304 0 2097152 5 1 1048576 0 524288 6 1 262144 0 131072 7 1 65536 8 1 32768 9 1 16384 10 1 8192 11 1 4096 0 2048 0 1024 12 1 512 0 256 0 128 0 64 13 1 32 0 16 0 8 14 1 4 0 2 15 1 1 0
    21 replies | 1006 view(s)
  • Alexander Rhatushnyak's Avatar
    22nd February 2020, 02:54
    Leonardo da Vinci's birthday! good choice! The front page of LTCB still says "50,000 euros of funding". From the FAQ page: > Unfortunately the author of phd9 has not released the source code. Except the enwik-specific transforms, which reduced the effective size of input (the DRT-transformed enwik8) by 8.36%. Besides, the dictionary from phda9 is now being used in cmix (and maybe in some of the latest paq8 derivatives?) I also shared ideas, but there were no contributions from others, and almost no comments. Seemingly no one believes that simple things like a well-thought-out reordering of wiki articles can improve result by a percent or two (-: > the first winner, a Russian who always had to cycle 8km to a friend to test his code because he did not even have a suitable computer I guess this legend is based on these words: "I still don't have access to a PC with 1 Gb or more, have found only 512 Mb computer, but it's about 7 or 8 km away from my home (~20 minutes on the bicycle I use), thus I usually test on a 5 Mb stub of enwik8". Always had to cycle? That's too much of an exaggeration! I still cycle 3..5 hours per week when staying in Kitchener-Waterloo (9..10 months of the previous 12 months) attend a swimming pool when there's enough free time, and do lots of pull-ups outdoors when weather permits, simply because I enjoy these activities. By the way, even though I was born in Siberia, my ethnicity is almost 100% Ukrainian, guess it would be better to call me "a Ukrainian". My first flight to Canada in May 2006 was from Kiev, because I lived there then, because all of my relatives, except parents, reside in Ukraine.
    39 replies | 1188 view(s)
  • suryakandau@yahoo.co.id's Avatar
    22nd February 2020, 02:21
    From LTCb site , I guess the winner is phda9 again.
    39 replies | 1188 view(s)
  • Jarek's Avatar
    22nd February 2020, 01:02
    Ok, so let's look at arithmetics of rANS decoding step: D(x) = (f * (x >> n) + (x & mask ) – CDF, s) We have n-bits accuracy, e.g. for 16bit state and 10bit accuracy, we need multiplication "10bits times 6bits -> 16 bits". In hardware implementation it would be exactly such multiplication, we can improve redundancy by using 1bit renormalization (count leading zeros gives number of bits).
    194 replies | 70670 view(s)
  • Darek's Avatar
    22nd February 2020, 00:46
    Darek replied to a thread Hutter Prize update in Data Compression
    Yes, It's 10GB. "Restrictions: Must run in ≲100 hours using a single CPU core and <10GB RAM and <100GB HDD on our test machine."
    39 replies | 1188 view(s)
  • Sportman's Avatar
    22nd February 2020, 00:30
    Is the test machine link right, it show only 3816 MB? I assume 10MB RAM must be 10GB?
    39 replies | 1188 view(s)
More Activity