> from gdcc notices thread, ms1 said do own work it means we can not combine open source.
> it means we must to code from scratch. and it is appreciated only 3000 euros.
No, Maxim (Ms1) simply doesn't want to keep testing MCM with minor changes,
so now you need 3% better compression than original to get it accepted.
Its really a troublesome point, because we can't really ask for fully independent implementations -
we'd just not get any participants then. But accepting minor changes of open-source software
also doesn't seem fair, so some compromise has to be made.
> The problem is that there are too many categories.
> If there were half as many, the prize would be 6000.
I'd rather remove the 2nd place prize... speed categories are reasonable,
and we had even more data types during discussion.
In any case, this can't be changed at this point already.
> An unique LZ77+Huffman can claim price only in Rapid Compression of english text...
There's also blockwise test and "mixed" (kinda executables).
Image test also can be made compatible by adding some filter, like http://www.radgametools.com/oodlelimage.htm
@suryakandau@yahoo.co.id
There is no value in a tweaking contest of MCM for specific data. Why don't you try to write something by yourself ?
@MS1
The API for test 4 has no parameter to initialize the (de)compressor for a specific category: int32_t CDECL encodeInit( void **cmprContext );
Also, optionally a function like int32_t CDECL dispose(void *cmprContext ); would allow the (de)compressor to free memory allocated during init.
I only doing research about hash function implementation on data compression and I think the bwt comment is right. The prize is to small compared to Google programmer salary. Btw Could you please make some tweaking on MCM and let me learn something about your tweaking ? Thank you
Filter and then LZ is a poor choice for compression of photographic images(maybe they gain some speed, but compression ratio is hurt).
There are simply no patterns in photographic images. You try to compress noisy data.
Also QLIC2 is very good. Either Alexander uses very good filters before compressing with FSE, or some kind of context modeling is used. BTW Is it considered a submission?
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
I agree with you and bwt on that. It's great to have a prize, but if the people who win it will pay their programmers (or programmer) more than they paid you to create it in the first place...that's not going to encourage people and might be considered taking advantage of them instead.
I'm not sure if it's too late to change it, but perhaps they could say that in addition to the prize, the creator with the winning submission gets a guaranteed percentage of revenue generated from the use of their algorithm, even if only up to a certain amount.
Even with the bar raised up to 3% improvement over 0.25%, there would still be stronger encouragement for competition and developers if there were residual incentives to any amount.
The large corporation(s) who stand to benefit from the results will no doubt make use of it to profit from it well, and it shouldn't be unreasonable for the winner of the contest (or if there are several winners, residuals / x submissions per category) to be able to enjoy some of those same benefits.
That might be the only way to really guarantee that it will be worthwhile to all competitors, and it doesn't require the companies and hosts for this contest to put up any additional money out of pocket beyond the prize amount, since the additional percentages will be provided by the amount saved over profits for storage and data transmission later on.
I'd like to underline that there is no such requirement. We do not require a completely new code made from scratch. It makes little sense and, actually, we can't check it since we don't ask for source codes. Of course, we can see similar patterns in results, thus detecting a minor modification of a known compressor is not hard.
Straightforward parameter tuning to overfit to a particular test data is the thing we don't want to encourage. Usually it's not a problem to fine-tune a complex enough codec with a lot of internal parameters to a particular data to get improvements within 1% or something like that. For the mixed data test I'd expect even more.
There is nothing interesting in such clones, no new knowledge in it. And the submitter just behaves as a parasite. For more complex systems, like H.264 video codecs and beyond, finding "good" sets of parameters may be a really hard-to-solve task with a big practical importance, but this is not our situation (yet).
On the other hand, if, say, somebody takes a 3rd party software (complying with all the applicable licenses and keeping copyrights), adds/changes preprocessors, adapts the core code to the output of such preprocessors, takes into account certain specifics of the test data, then this may give an essentially different compressor in its behavior and with much better results. This is a valid approach. In my own opinion, this is a quite good approach as well.
Indeed.
Yes. The parameters have to be hard-coded. So if you want to participate in all 3 speed categories, there should be 3 different libraries (submissions).
And in my defense I'd like to mention that I suggested to pass parameters explicitly in the first draft of the API, but, finally, my opinion was in minority. Not that it matters much.
I think there is something wrong from the beginning if this ends with money discussions. We are not solving a practical task here. This is not a work made for hire. This is not an R&D project. Surely, for a real project the budget has to be bigger at least by an order of magnitude. Nobody hires here. There is no immediate use and no revenue. The test data was selected and generated under my direct supervision, this is a synthetic data with certain properties, not a training set for a real practical task. No crowdsourcing here.
The event is mainly for authors of existing compressors. However, it's possible to use a 3rd party source code insofar as its license(s) allows it and you produce something essentially different from the original compressor(s).
This is an opportunity to compete. Besides the regular text category, there are very interesting block test and mixed data. Quite practical. The image test data also gives a new look.
The prize for the winners is just a sweet addition.
If an author of an existing compressor does not want to invest several days in checking the data and adapting the program to suit the rules, then there is little we can really do about it.
Sorry, what exactly do you mean?
why "if" ? Yes, there are open source image codecs, for example JPEG-XL.
As for 3%, first, all questions about 3% threshold should go to Maxim, and second, you should aim for 30% rather than 3%!
In short, Machine Learning with Machine Inventing should help you get closer to 30% than to 3%.
I'm sure it's possible to approach and surpass the compression quality achieved by cmix and nncp
at much higher speeds, possibly 100+ times higher, at least in the GDCC scenario,
because the vast majority of ML+MI can be done outside of the compression/decompression processes,
and therefore have no impact on ctime and dtime.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
encode (21st August 2020),Hakan Abbas (21st August 2020)
Maybe this is interesting and useful link
https://bair.berkeley.edu/blog/2019/09/19/bit-swap/
1. Let one coder writes a compressor in C(++), its compiler uses msvcrtXXX.dll and may be other runtime libraries. Let another coder writes a compressor in Delphi, its compiler inserts runtime code into exe file. I think, all runtime dll's size must be added both to compressor and decompressor size: both to c_size & c_decompressor_size in the formula at Ranking section in globalcompetition.compression.ru.
2. I think, that original compressor should be rated higher than a compressor using other people's sources. To prove originality, the programmer must show his source code. It is prohibited to wear other people's awards.
3. "Random people" may try to hide a test file or a part of it in a deep directory and show a fantastic compression ratio. Is it being monitored by the arbitrators?
Last edited by lz77; 28th August 2020 at 12:13.
Ms1: What guarantees do you give that successful compressors will not be used to improve compressors some participans/friends of referees?
There is no leaderboards update until now.
It isn't practical to advance lossless data compression in ways it has never been done before?
It seems that both money and protection of the outcome would be greatly important if that is the true intention, and it would need to be worth the time and the effort of the author to create something like that rather than build on what exists today publicly.
If it is only an attempt to increase the results of pre-existing works and recent context-mixing compressors and authors thereof by a threshold percentage where the results are isolated to improvements with machine learning only, then I would suggest clearly saying so on the announcement rather than having people see it as a larger contest for universal data compression advancement that will be worth their while.
If it is only for sport, then that's what it is and there's nothing wrong with that, as long as it's understood that is all it is.
Machine learning can at times be advantageous to Data Compression...but it is NOT Data compression nor is it required for it, and it is not the prerequisite for it in any case.
Pretending that data compression can only move forward with machine learning is folly and will lead things down a very dark, bloated, misguided, and unproductive road for the future if people are taught that it has to go that direction.
But if that is what the contest is really about, perhaps it would be best to rename it "Machine Learning Contest for use with Data Compression Sets and Context Model Improvements restricted to Public Authors" or something of the like?
It is a very restricted and isolated case for data compression. Lossless, maybe. Universal? No.
Someone mentioned that the vast majority of ML+MI can be done outside of compression programs themselves and have a relative impact only upon compression/decompression times, and while true, that seems like it would (and should) fall under an entirely different category; a subset of data analysis for Precomp or Premodification of Data prior to compression which could be an entirely different field of study apart from data compression itself.
Modification of data prior to processing and modeling can be an extension or an enhancement to compression engines both pre-existing and future...but there are compression strategies where it would not be advantageous or waste time for the compressor or end-user if the results were not significant or yielded more cost for the transformation in time than it yielded data saved for space or transmission.
It really should be it's own avenue since compression can (and does) occur other ways without that modeling, even if the efforts for those transformations help or hurt (in incompatible data types) when used.
(I suggested Pre-Modification as an alternative supplemental term above as not to confuse it with already existing Precomp approaches for various file formats and known data structures.)
(An example of Pre-Modification that is not Precomp per se would be something like this with text analysis. If a text file contained the words "Thirty Seven Eight Fourty Four", you could use a prefix symbol that converts those words to digits, or skips straight to converting them to 3 to 4 bit rice codes. It would seem that the desire to use machine learning are attempts to do things like that with contexts and predictions which "can" be advanced, but should not be the ONLY expected way of doing so and are not always the correct substitute or answer for actually making the data smaller if no foreseeable transformation that way is possible with the data being worked upon).
> What guarantees do you give that successful compressors will not be used
> to improve compressors some participans/friends of referees?
If you're paranoid enough to think that director of Yuvsoft would risk reputation
to steal your money... well, just don't participate.
> It isn't practical to advance lossless data compression in ways it has never been done before?
Single-thread solid compression with 1gb window is currently not practical,
in a sense that there's no market for that kind of compression.
Compression is still very important, but its applications moved from end-users (archivers etc)
to data centers (storage and communications), where resources available to a compressor
are quite limited.
But there's still ongoing interesting research in "non-practical" areas,
so we decided that corresponding competition categories are still necessary,
if only to reward innovation - even if it doesn't have any immediate applications.
> it would need to be worth the time and the effort of the author to create
> something like that rather than build on what exists today publicly.
Again, the competition tasks were based on popular topics in "compression scene",
rather than on actual industrial demands.
> If it is only an attempt to increase the results of pre-existing works and
> recent context-mixing compressors and authors thereof by a threshold
> percentage where the results are isolated to improvements with machine
> learning only,
No, ML (or parameter optimization) is just one of the options.
You can as well do speed optimizations, or improve entropy models,
or add new types of LZ tokens, or write a preprocessor...
Point is, there're multiple ways of making a winning entry in 1-3 days,
Alex simply suggested only one of them (ML).
Also keep in mind that unlike Hutter Prize etc, GDC has 12 different tasks with separate prizes.
Alexander Rhatushnyak (11th September 2020),Mike (11th September 2020)
> ERR_CONNECTION_RESET
Well Linkedin is blocked in Russia - use VPN or https://www.yuvsoft.com/ https://compression.ru/video/
> Is Maxim Smirnov the only one who has access to the compressors?
He's the person responsible for GDC site, email and codec testing.
Hello, when is the deadline for submitting for the next leaderboard update? The main page says every month while rules mention every 2 weeks.
Is there any point in submitting compressors that won't win prizes?