Test PC need to be behind a firewall what block Internet connection or without Internet connection while running "alien" (de)compression software.
For example Krc could store and retrieve a dictionary in/from a remote cloud to save storage space and gain compression speed https://encode.su/threads/1935-krc-k...ll=1#post39550
My preprocessor code crashed little over halfway Test 1 file Qualitative data (text only), I discovered some strange data there:
..."
MARIE
CECILIA
MARIE
CECILIA
MARIE
CECILIA (with a smile)
MARIE
ALBERT
CECILIA
AMADEUS
CECILIA
MARIE
CECILIA
ALBERT
MARIE
CECILIA
AMADEUS (who has been standing a little way off)
CECILIA
MARIE
ALBERT
CECILIA
AMADEUS
CECILIA
AMADEUS
CECILIA
AMADEUS
CECILIA
AMADEUS (with assumed brusqueness)
CECILIA
AMADEUS
CECILIA
AMADEUS
CECILIA
"....
Looks like missing sentences from original while names are listed:
..."
MARIE
Oh, that would be fine!
CECILIA
Did you hear that, Amadeus?
AMADEUS (who has been standing a little way off)
Certainly. It would be very nice.... You can wait for us in the Tirol.
CECILIA
Could you come and see me to-morrow afternoon, Marie? Then we might settle the matter.
MARIE
Yes, indeed. I am always glad when you can spare me a little of your time.—Until to-morrow, then!
ALBERT
Good-by. (He and Marie go out)
AMADEUS (is walking to and fro)
CECILIA (who is sitting on the couch, follows him with her eyes)
AMADEUS (after a turn to the window and back, speaking in a peculiarly dry tone) Well, how did it go? Have you got the finale into shape at last?
CECILIA
Oh, in a manner.
AMADEUS
The day before yesterday it had not yet been brought up to the proper level. I find, for one thing, that they don't let you assert yourself sufficiently. Your voice should be floating above the rest, instead of being submerged in the crowd.
CECILIA
Won't you come to the rehearsal to-morrow—just once more—if you can spare the time?
AMADEUS
Would it please you...?
CECILIA
I always feel more certain of myself when you are within reach. You know that, don't you?
AMADEUS
Yes—I'll come. I'll call off my appointments with Neumann and the Countess.
CECILIA
If it isn't too great a sacrifice....
AMADEUS (with assumed brusqueness)
Oh, I can make her come in the afternoon.
CECILIA
But then there will be no time left for your own work. No, better let it be.
AMADEUS
What had we better let be?
CECILIA
Don't come to the rehearsal to-morrow.
"...
https://www.gutenberg.org/files/2974...-h/29745-h.htm
schnaader (28th June 2020)
Might be a bug in my filter script. But I don't think its a good reason to change anything.
In fact, isn't it a good dataset if it lets you detect preprocessors bugs? :)
I might as well join this competition. We have up to Nov. 20, 2020, hmm?
First, since my dual-core computer crashed, i have to buy a new computer. And i have to learn how to install g++ again oh my! (but i still got bcc32). After almost a decade, i might be coding again. Brave.
Tried entering lzuf2 as a test submission, but gmail failed to send. Now email is "queued".
Will sponsor Huawei own the submitted compressors? If not, will it buy the winning compressor?
After a first quick glance, testset 3 looks fine. Looking for compressed leftovers, only found this one so far (very small ZIP part, 210 bytes decompressed).
This is the most interesting testset for me, because this kind of data dominates things like Android APK files and is a mix of structured pointer lists, program code and string sections (e.g. method names), so some preprocessing (parsing and reordering stuff, detecting structured data) should be the way to go and would help compressing data like this.
Left side is from the original file, right side is the output of "Precomp -cn"
http://schnaader.info
Damn kids. They're all alike.
TS40.txt is not an ASCII, even not an Latin-1 file. It contains characters left over from UTF-8 (with decimal codes 128, 145, 147, 225, 226). For example see next line after line
"Thank Heaven! . . . . Good-night."
Or see next word after
She learned of the great medicine,
The description says "texts from Project Gutenberg in UTF-8 characters, so it’s essentially ASCII", not that its ASCII.
> It contains characters left over from UTF-8
That's intentional in this case. No need to make it too simple.
Any comments on the site design? https://globalcompetition.compression.ru/
How do you think we can improve it to increase participation?
Given the duration of the competition, there is little incentive to participate early. I think that participation will increase over time.
Perhaps providing the leader boards and specifying the interface for test 4 will speed things up ?
Guess there also is some latency involved that delays submissions. For example, I'm quite busy with other things at the moment, but aim at doing a submission (perhaps next month?). First steps are collecting potential algorithms/combinations to reach the time limit and find the resulting ratios though, it doesn't make sense for me to submit anything before that. Also, using Precomp as a base or submitting its base score isn't useful, as it basically would be similar to a base score of pure lzma2 or bzip2 (and none of the Precomp overhead really helps for the contest).
http://schnaader.info
Damn kids. They're all alike.
> Guess there also is some latency involved that delays submissions.
We also don't get many people reading the Rules page, so I suspect that current GDC site design causes TLDR syndrome.
> Using Precomp as a base or submitting its base score isn't useful
Yeah, its not supposed to involve recompression, since its a lot less popular than "universal compression",
so we'd not get much participation.
For example, there's a practical task (in storage) of applying recompression to small independent blocks -
deflate and jpeg recompression for these circumstances would be quite helpful, but would you find
time to write a custom deflate recompressor for middle blocks of deflate stream?..
Or an algorithm for reconstruction of huffman code from jpeg data without header?
But I'm quite sure that you can patch up a working combination of WRT+zstd or a 2D delta for image data.
Why i can not send submission for GDCC ? i have email globalcompetition@compression.ru and that blocked my submission.
If possible, avoid attaching files. Mail servers nowadays know better than users. A link is safe (probably).
There is no license grant or whatever. A submitted compressor belongs to the author(s). Actually, we don't send the executables to Huawei, believe it or not.
We can't say if Huawei will be interested in buying something. But, apparently, certain divisions of the company are interested in the topic now.
My own opinion: Reverse engineering, if you care, does not make sense. I hardly see how any big company may be doing this in such situations. If there will be something useful to reverse engineer, it will be cheaper to buy the author. Lock, stock, and barrel.
Hakan Abbas (1st August 2020)
From https://globalcompetition.compression.ru/test4api/ : "The size of the input buffer is the block size to be used in the test (inSize = 32,768 bytes)."
I think, it will be great to add 32 bytes to this inSize, because LZ77 compressor may try to read some (two or more) bytes after begin of an input buffer. The check out-of-bounds access of the buffer will cost time...
Last edited by lz77; 3rd August 2020 at 15:06.
Combining open-source preprocessors and coders hardly requires months of work.
And on other hand, an unique new work could claim prizes in multiple categories.
The problem is that there are too many categories. If there were half as many, the prize would be 6000.
lz77 (11th August 2020)