View Poll Results: What should I release next?

Voters
16. You may not vote on this poll
  • Improved LZ4X

    3 18.75%
  • Improved ULZ

    3 18.75%
  • Improved ULZ with a large window

    10 62.50%
Page 2 of 4 FirstFirst 1234 LastLast
Results 31 to 60 of 101

Thread: LZ4X - An Optimized LZ4 Compressor

  1. #31
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by m^2 View Post
    Why does 128 look better than 256 to you?
    I'd try to lengthen it still.
    In some scenarios, even 128K window might be too large (small chunk compression).
    However, as a general rule, larger window (dictionary) = better compression, especially if we talking about byte-aligned LZ coders like LZ4 or some versions of LZO. At the same time, and at some point we must start using a variable length codes for offset (match distance) coding. And IMO 64K-256K is the limit if we code distance in fixed bits. In addition, the bigger the dictionary, the slower the compressor.
    I have tested many LZ4 modifications, and came up with the Enhanced LZ4 - LZ4 with 256 KB window (vs 64 KB) - LZ4X or LZ4v2
    It's like the Deflate vs Enhanced Deflate (Deflate64).
    Actually, everything is pretty the same as with original LZ4 legacy frame (the same file extention .lz4), except:
    New magic number - possibly "LZ4X"
    Block size is 16 MB
    Match distance of 0 is either unused or represents the EOF marker
    The same file structure: compressed size, compressed data, etc.
    The main difference - window size = 256 KB
    To make this possible we modify the "Token" byte as follows:

    rrr oo lll

    3 highest bits represents literal run length (7 means read one extra byte etc., 255 means read another one etc.)

    2 middle bits represents the highest bits of a match distance

    3 lowest bits represents a match length (7 means read one extra byte etc.)

    With such Token structure we may efficiently extract fields:

    int Token=GET_BYTE();
    if (Token>=32)
    {
    int LiteralRun=Token>>5;
    // ...

    int MatchPos=Pos-((Token&0x18)<<13);
    MatchPos-=GET_BYTE();
    MatchPos-=GET_BYTE()<<8;
    if (MatchPos==Pos) // EOF
    // ...

    int MatchLen=Token&7;
    // ...

    To be continued...

  2. #32
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Smaller block. Right.

    Other than that, encoder shouldn't be slower at the same strength. With a greedy parser and HT match finder, speed should be the same. If you use stronger techniques and get lower efficiency than with a small dict, you're doing it wrong.

  3. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Larger dictionary = more matches, more positions to check... The longer Hash Chains or the larger Binary Tree...


  4. #34
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Sure, sure the same algorithm will be slower. But it will be stronger. And if overall efficiency drops (still talking about files that can benefit from larger dict), implementation is wrong, not dict size.

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Cyan, what is your opinion on this? Any suggestions?

  6. #36
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    As far as I'm concerned, I believe the most important thing is to ensure user expectation consistency.
    If a file is branded `*.lz4`, it should be compatible other lz4 tools, hence respect official interoperable format.

    There is no problem creating a fork introducing a different format.
    That's what inikep did with lz5. I welcome such initiatives.
    But please make sure users understand it's a different format (for example, by emphasizing the X of lz4x, or using a completely different name).


    Then, selecting between 128 or 256 KB window size becomes a specific format discussion.
    I suspect 256 KB will usually give more compression ratio, although this statement should be backed by tests.
    256 KB is also the size of a typical L2 cache, so it should remain relatively fast to decompress.

    Such choice also expects a "large" input, such as a big file, to deal with.
    From an archiver perspective, it makes sense.

    From a library perspective, less so.
    Why ? because most LZ4 real-world applications work on small data blocks (<= 64 KB).


    Bottom line : no silver bullet.

  7. Thanks (2):

    encode (1st March 2016),Turtle (2nd March 2016)

  8. #37
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Some testing results for LZ4 with 256 KB window, for future reference:
    enwik8: 100000000->38324398
    enwik9: 1000000000->337785382
    dickens: 10192446->3845494
    samba: 21606400->5760769
    webster: 41458703->12381074
    xml: 5345280->634061
    world95.txt: 2988578->815206
    book1: 768771->324652
    calgary.tar: 3152896->1118982
    3200.txt: 16013962->6291766
    mptrack.exe: 1159172->632586
    reaktor.exe: 14446592->3022800
    photoshop.exe: 19533824->9186191

    Well, I think I will keep the LZ4X as an LZ4 legacy frame compatible compressor then. Better release an improved CRUSH compressor I guess. And if you will release something, I will follow you...

  9. Thanks:

    Cyan (1st March 2016)

  10. #38
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    How about having 2 bytestreams with different window sizes and either switching after some size threshold or just indicated with a flag? Lots of complexity, but allows to have both good small block and large block strength with good compression and decompression speed.

  11. #39
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    46
    Thanks
    14
    Thanked 11 Times in 7 Posts
    Hi Ilya,

    what means your "advanced string parsing" to find longest match? Do you use suffix arrays with search acceleration or you have invented something new and faster?

    Serge

  12. #40
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    As I already wrote:
    Normal ("c") and High ("ch") modes = Hash Chain match finder with Greedy parsing.
    Extreme ("cx") mode = Binary Tree match finder with Optimal parsing (Some people call this variant Dynamic Programming, in other words it is something by far more advanced than a generic Storer&Szymanski Optimal parse).
    Honestly, I have planned this program expecting a notable higher compression ratio gain compared to LZ4HC...

  13. #41
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Quote Originally Posted by encode View Post
    Honestly, I have planned this program expecting a notable higher compression ratio gain compared to LZ4HC...
    The heuristic parse in LZ4HC is very impressive. Yann has identified very well the cases that need to be tested to find an ideal parse. Going from there to a full-fat optimal parse is generally not much win.

    In more complex formats it's almost impossible to find those rules (and of course with entropy coding it's impossible).
    Last edited by cbloom; 3rd August 2016 at 21:00.

  14. #42
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Honestly, I have planned this program expecting a notable higher compression ratio gain compared to LZ4HC...
    Well, you did better than I expected is possible.

    At some point I started writing an optimal LZ4 compressor (strength-wise, without any speed optimisations). I haven't finished it, I realised the output was more different from what I had than I expected and I quit. I was not willing to spend those few extra evenings.

  15. Thanks:

    encode (4th March 2016)

  16. #43
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Working really hard on a new release (v1.02) right now. What's new:
    • Improved compression ratio, once again. Now I keep an eye on long literal runs more precisely. New ENWIK9 result is 372,068,631 bytes
    • Notable faster decompression
    • Slightly higher compression ratio with "ch" mode
    • Faster default compression with some compression loss

    And for parsing comparison:

    book1 result is 359,284 bytes (LZ4 compatible)

    With no magic number and LAST_LITERALS=0 (should we call it PAD or PADDING_LITERALS?)
    -> 359,279 bytes
    and if we not storing a compressed size:
    -> 359,275 bytes

    Same benchmark with world95.txt (from http://maximumcompression.com)

    LZ4 compatible result is 942,409 bytes.

    No magic number+no padding literals:
    -> 942,401 bytes
    If subtract compressed size:
    -> 942,397 bytes


  17. Thanks:

    Cyan (15th March 2016)

  18. #44
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Code:
    Optimized LZ4 Compressor, v1.02
    Copyright (C) 2016 Ilya Muravyov
    
    Usage: LZ4X command infile [outfile]
    
    Commands:
      c[1..4] Compress (1-fast..4-extreme)
      d       Decompress
    Added new compression modes:

    1. fast - single match probe, greedy parsing (pretty fast, compression is slightly higher than lz4 -1)
    2. normal - full hash chain search, greedy parsing - default compression mode
    3. high - full hash chain search, lazy matching (kept for reference, compression is slightly worse than lz4 -9)
    4. extreme - full binary tree search, optimal parsing - best compression possible within given format

    Sort of a bundle

    enwik9 results (lz4 compatible):
    1. -> 472,784,650 bytes
    2. -> 392,104,176 bytes
    3. -> 379,633,926 bytes
    4. -> 372,068,437 bytes


  19. Thanks (3):

    JamesB (7th April 2016),Mike (1st April 2016),Turtle (2nd April 2016)

  20. #45
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    324
    Thanks
    182
    Thanked 53 Times in 38 Posts
    Quote Originally Posted by encode View Post
    Code:
    Optimized LZ4 Compressor, v1.02
    Copyright (C) 2016 Ilya Muravyov
    
    Usage: LZ4X command infile [outfile]
    
    Commands:
      c[1..4] Compress (1-fast..4-extreme)
      d       Decompress
    Added new compression modes:

    1. fast - single match probe, greedy parsing (pretty fast, compression is slightly higher than lz4 -1)
    2. normal - full hash chain search, greedy parsing - default compression mode
    3. high - full hash chain search, lazy matching (kept for reference, compression is slightly worse than lz4 -9)
    4. extreme - full binary tree search, optimal parsing - best compression possible within given format

    Sort of a bundle

    enwik9 results (lz4 compatible):
    1. -> 472,784,650 bytes
    2. -> 392,104,176 bytes
    3. -> 379,633,926 bytes
    4. -> 372,068,437 bytes

    Can't wait for this to be released!

  21. Thanks:

    encode (1st April 2016)

  22. #46
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    A new version has been released! I personally do like the new results! Anyway, it looks like this is the final version in a row!

  23. Thanks (4):

    comp1 (7th April 2016),Cyan (7th April 2016),Matt Mahoney (8th April 2016),Mike (7th April 2016)

  24. #47
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Updated LTCB. http://mattmahoney.net/dc/text.html#3721
    If you want to time on your faster hardware you can maybe regain your spot on the Pareto frontier.

  25. Thanks:

    encode (8th April 2016)

  26. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    As a note, the results on my machine are here:
    http://encode.su/threads/2447-LZ4X-A...ll=1#post46531

  27. Thanks:

    Matt Mahoney (9th April 2016)

  28. #49
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Any chance for a source release?

  29. #50
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I will wait for some time and for feedback and decide on release or not/license and hosting platform (SourceForge/GitHub)

  30. #51
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by encode View Post
    I will wait for some time and for feedback and decide on release or not/license and hosting platform (SourceForge/GitHub)
    FWIW, I don't really see much reason to host on SourceForge, even now that it is under new ownership and no longer bundles adware with downloads.

    I'd love to suggest GitLab; it has a good UI, an excellent feature set, and it's open-source software. However, if you want to try to encourage community involvement GitHub is the way to go… you'll almost certainly get more publicity, bug reports, patches/pull requests, etc. hosting there than anywhere else.

  31. Thanks:

    jibz (10th April 2016)

  32. #52
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    One thing I don't like about GitHub is monoculture. On two fronts.
    First, it's nearly a monopoly among source hosting sites.
    Second, (unlike its competitors) it's not VCS-agnostic. Its near-monopoly on the hosting front makes the leading VCS a near-monopoly by itself.
    And monoculture is not good.

    Another thing I don't like about it is git, but that's more of a personal matter.


    License...you surely know the traditional arguments about strong copyleft, weak copyleft etc. Yet you still ask. I wonder what sort of answer do you need.
    For the starters I'll give you an important, but less stressed in the press summary of licenses based on who prefers which license.
    * (L)GPL3 - Many Linux users like it. Corporations strongly don't. BSD users strongly don't. GPL2 users weakly don't.
    * (L)GPL2 - Many Linux users like it. Corporations strongly don't. BSD users weakly don't. GPL3 users weakly don't.
    * AGPL2 - Nobody likes it
    * AGPL3 - Few Stallman supporters like it.
    * Apache2 - The most preferred corporate license. Some Linux users don't like it and some BSD ones don't either, but overall there are few gripes.
    * Public Domain - non lawyers tend to like it, aside from that see MIT below.
    * MIT/3-clause BSD - Some Linux users don't like it and some corporations don't either, but overall there are very few gripes.
    * Custom - depends on the text, but hardly anyone likes them

    Please note that I don't mention Windows and Mac users specifically. That's because the general trend is "I don't care". Nevertheless, these ecosystems have so much mass that outliers who do care are numerous enough to matter. What do they tend to prefer? I'll tell you: I don't know.

    You used to pick public domain. I think it's a great choice. If you have more specific questions, ask.

  33. Thanks:

    encode (10th April 2016)

  34. #53
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    hmm? why don't corporations like MIT?

  35. #54
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Mangix View Post
    hmm? why don't corporations like MIT?
    Most like it, more or less. But they have one problem with it - it has no patent clause. They prefer Apache2 because of that.

  36. #55
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by m^2 View Post
    License...you surely know the traditional arguments about strong copyleft, weak copyleft etc. Yet you still ask. I wonder what sort of answer do you need.
    For the starters I'll give you an important, but less stressed in the press summary of licenses based on who prefers which license.
    * (L)GPL3 - Many Linux users like it. Corporations strongly don't. BSD users strongly don't. GPL2 users weakly don't.
    * (L)GPL2 - Many Linux users like it. Corporations strongly don't. BSD users weakly don't. GPL3 users weakly don't.
    * AGPL2 - Nobody likes it
    * AGPL3 - Few Stallman supporters like it.
    * Apache2 - The most preferred corporate license. Some Linux users don't like it and some BSD ones don't either, but overall there are few gripes.
    * Public Domain - non lawyers tend to like it, aside from that see MIT below.
    * MIT/3-clause BSD - Some Linux users don't like it and some corporations don't either, but overall there are very few gripes.
    * Custom - depends on the text, but hardly anyone likes them
    Wow, those are some pretty big generalizations (many of which I disagree with). Also, I think people should choose a license based on what the license does, not some group's opinion of it. Instead of trying to say which groups like which licenses, it would be better to just have a quick summary of each one…
    • GPLv2 — if you use the code then you must release your code under similar terms.
    • LGPLv2 — people can use the code even it proprietary software, as long as they release any changes to the code are released under similar terms.
    • AGPLv2 — The GPL is triggered by distribution; if you distribute GPL-licensed software to someone then you have to offer the source code, but if you never distribute the code and instead use it to create a service which interacts over the network with the GPL you don't have to offer the source code, but with the AGPL you do.
    • (A|L)GPLv3 — the big thing these add is a prohibition on tivoization (releasing the source code under the GPL but preventing modified versions from running). There is also some patent stuff.
    • Apache 2 — permissive license which allows integrating the code into proprietary software. It includes a patent license grant, which prevents people from releasing open-source software then suing users for patent infringement.
    • 3-clause BSD — It's short, and simple enough for non-lawyers to grok; just read it. Basically, people can do whatever they want as long as they follow the rules in those clauses (keeping the copyright notice in the code, adding the full text of the license to documentation, and not using the name(s) of the copyright holders to promote the software without prior written permission).
    • MIT — Also short and readable. Basically, "do whatever you want, I'm not liable".

    ChooseALicense.com has some good (easy to understand) information on several licenses.

    Quote Originally Posted by m^2 View Post
    You used to pick public domain. I think it's a great choice. If you have more specific questions, ask.
    I completely disagree with this. All the licenses you listed have very good arguments in favor of them, but there is basically no benefit to public domain over MIT. For those who don't know what public domain is: when you create something it is automatically copyrighted, and a public domain dedication is an attempt to opt-out of copyright protection. That sounds great on the surface, but there are some very big practical issues:
    • Legally ambiguous — Most countries don't really have the concept of opting out of copyright as part of their laws, so lawyers have a good reason to be nervous about this one. See https://creativecommons.org/about/cc0/.
    • Most public domain dedications don't include a disclaimer of liability (note: CC0 does), so bug in your code which causes some company's data to be corrupted, they could conceivably sue you for damages.

    If really want to use a public domain dedication, use CC0; simply saying that you wish for the software to be in the public domain is insufficient.

  37. Thanks (2):

    jibz (11th April 2016),schnaader (11th April 2016)

  38. #56
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by nemequ View Post
    ChooseALicense.com has some good (easy to understand) information on several licenses.
    I can't argue with that exact phrase, but overall I consider this site a terrible resource, for inaccuracy.
    In particular, implying that MIT and Apache2 users don't care about sharing improvements is so grossly wrong that I want to scream whenever I see someone advocating this site.

    Quote Originally Posted by nemequ View Post
    I completely disagree with this. All the licenses you listed have very good arguments in favor of them, but there is basically no benefit to public domain over MIT.
    There are 2 benefits:
    * it's the simplest and the most understandable license
    * it puts less legal burden on user
    Not that either of these is a big difference, but for me that makes a difference.

  39. #57
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by m^2 View Post
    I can't argue with that exact phrase, but overall I consider this site a terrible resource, for inaccuracy.
    In particular, implying that MIT and Apache2 users don't care about sharing improvements is so grossly wrong that I want to scream whenever I see someone advocating this site.
    Looking at just the main page (which I think is what you're talking about, if not let me know), I don't think it does that at all. Certainly no more than it implies that MIT and GPL fans don't care about patents, or Apache 2/GPL don't care about simplicity or permissiveness. Maybe the wording should be changed to make it clearer that it is asking what you care about most, but TBH until you objected just now it didn't even occur to me that someone might read it as "you get to choose one of these aspects, and the others are completely ignored".

    Quote Originally Posted by m^2 View Post
    There are 2 benefits:
    * it's the simplest and the most understandable license
    No, it's not. It may look the simplest at first glace (at least to a non-lawyer) if you just use a very simple statement that you wish for the code to be in the public domain, but the truth is vastly more complicated. A simple statement is very complicated legally, and the effects could be completely different in different jurisdictions. Quoting the "About CC0" page:

    Dedicating works to the public domain is difficult if not impossible for those wanting to contribute their works for public use before applicable copyright or database protection terms expire. Few if any jurisdictions have a process for doing so easily and reliably. Laws vary from jurisdiction to jurisdiction as to what rights are automatically granted and how and when they expire or may be voluntarily relinquished. More challenging yet, many legal systems effectively prohibit any attempt by these owners to surrender rights automatically conferred by law, particularly moral rights, even when the author wishing to do so is well informed and resolute about doing so and contributing their work to the public domain.
    The only PD dedication that I'm aware of which actually looks pretty solid legally (note: IANAL) is CC0, and it is much more complicated and difficult to understand than the MIT license. The "Public License Fallback" section is particularly telling; the first two sections try to place the work in the public domain, but the license is so skeptical about how successful that would be that they basically include something very MIT-like as a fallback. Even lawyers can't figure out the effects of a public domain dedication… definitely not the "simplest and most understandable license".

    It's also not a license, but I'm going to assume that you're just using that word for convenience.

    * it puts less legal burden on user
    No, it doesn't. It leaves the user in a legally ambiguous situation, which is a huge burden.

    It's also worth pointing out again that the vast majority of PD dedications don't include a disclaimer of liability. I don't know about you, but the idea of being sued for damages because something I generously gave away for free isn't absolutely perfect is a pretty big problem. Again, CC0 is an exception; if you really want to try to go PD that's definitely the way to do it.

  40. #58
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by nemequ View Post
    Looking at just the main page (which I think is what you're talking about, if not let me know), I don't think it does that at all. Certainly no more than it implies that MIT and GPL fans don't care about patents, or Apache 2/GPL don't care about simplicity or permissiveness. Maybe the wording should be changed to make it clearer that it is asking what you care about most, but TBH until you objected just now it didn't even occur to me that someone might read it as "you get to choose one of these aspects, and the others are completely ignored".
    Seriously? They are a website that's meant to help in a license choice. The catch phrase is "I care about sharing improvements". This clearly means that if you care about sharing, you should pick GPL. Which clearly means that if you care about sharing you should not pick the alternatives. At which point am I wrong?

    The catch phrase shall be the most important license property that distinguishes it from others. Here it's something that doesn't help in distinguishment. But it does help in making a choice. Though misinformed one.

    Quote Originally Posted by nemequ View Post
    No, it's not. It may look the simplest at first glace (at least to a non-lawyer) if you just use a very simple statement that you wish for the code to be in the public domain, but the truth is vastly more complicated. A simple statement is very complicated legally, and the effects could be completely different in different jurisdictions. Quoting the "About CC0" page:



    The only PD dedication that I'm aware of which actually looks pretty solid legally (note: IANAL) is CC0, and it is much more complicated and difficult to understand than the MIT license. The "Public License Fallback" section is particularly telling; the first two sections try to place the work in the public domain, but the license is so skeptical about how successful that would be that they basically include something very MIT-like as a fallback. Even lawyers can't figure out the effects of a public domain dedication… definitely not the "simplest and most understandable license".
    You know what? MIT has all the same legal issues as simple public domain dedications. In my country (Poland) it's not a valid license at all(*), so all BSD / MIT software that I use I use illegally.
    But most ignore that and feel comfortable with the license, yet moan about public domain.
    Being legally bullet-proof is strictly impossible. You can get very near at the cost of a huge complexity or at many points in between.
    It applies to public domain as well as to any other license. But equally bullet proof license that has simpler terms will always be simpler. Strictly simpler.

    Quote Originally Posted by nemequ View Post
    It's also not a license, but I'm going to assume that you're just using that word for convenience.
    I call "public domain" any work that is either:
    * not copyrighted
    * licensed w/out any restrictions
    regardless of how strong is the legalese of the license.

    Quote Originally Posted by nemequ View Post
    No, it doesn't. It leaves the user in a legally ambiguous situation, which is a huge burden.
    Yes, it does. Ambiguity is orthogonal to actual terms and PD terms are strictly less stringent. After all, BSD requires attribution and PD doesn't.

    Quote Originally Posted by nemequ View Post
    It's also worth pointing out again that the vast majority of PD dedications don't include a disclaimer of liability. I don't know about you, but the idea of being sued for damages because something I generously gave away for free isn't absolutely perfect is a pretty big problem. Again, CC0 is an exception; if you really want to try to go PD that's definitely the way to do it.
    I didn't speak about how to do the PD dedication. Frankly, I don't advocate any choice as I see all of them as broken in some way. But this point made me interested. Are you aware of any open source developer that got sued by a user? The risk definitely is there, it would be best to quantify it, but it would be good to at least have *some* information on it.

    Ad (*):
    My country requires all copyright licenses to explicitly state all fields of use and does not consider "any use" to be a valid field. You are required to list "use, copying, storing" and a number of other fields recognised in the doctrine.
    Actually this means BSD (or public domain or GPL) terms are impossible to express because the list of fields of use may be amended with the world development. It happened in our lifetime when courts decided that publishing on the internet is different from regular publishing; therefore all licenses that were unlimited before the internet suddenly got a limitation. If you think that's freaking broken, welcome to the law.

  41. #59
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    (Sorry for the delay, I forgot about this discussion, and I don't check encode.su all that often, especially when I'm busy…)

    Quote Originally Posted by m^2 View Post
    Seriously? They are a website that's meant to help in a license choice. The catch phrase is "I care about sharing improvements". This clearly means that if you care about sharing, you should pick GPL. Which clearly means that if you care about sharing you should not pick the alternatives. At which point am I wrong?
    You're oversimplifying. You can want multiple things, the question is where the priority lies. If your highest priority is that people share their improvements then you should probably choose the GPL.

    The catch phrase shall be the most important license property that distinguishes it from others. Here it's something that doesn't help in distinguishment. But it does help in making a choice. Though misinformed one.
    If you have a better idea, tell them.

    Maybe a better idea would be filling out a form with a handful of questions (kind of like what Creative Commons does)… copyleft or permissive, patent grant or no, etc.

    You know what? MIT has all the same legal issues as simple public domain dedications. In my country (Poland) it's not a valid license at all(*), so all BSD / MIT software that I use I use illegally.
    [citation needed]. I've been doing this for a long time, and I've never heard anything like that. If that is true, I really hope someone sues the Polish government for copyright infringement just to push them to change the law.

    But most ignore that and feel comfortable with the license, yet moan about public domain.
    Most people don't ignore that, most people have never heard that. Most people also don't live in a country with such an absurd law^H^H^Hthat particular absurd law. The public domain has some very real problems in pretty much every country. Just because MIT may (honestly, I'm having a very hard time believing this) not work correctly in one jurisdiction doesn't put them on equal footing.

    Being legally bullet-proof is strictly impossible. You can get very near at the cost of a huge complexity or at many points in between.
    It applies to public domain as well as to any other license. But equally bullet proof license that has simpler terms will always be simpler. Strictly simpler.
    And a simple public domain dedication is known to be extremely weak. MIT, OTOH, is a short and simple license which achieves basically the same thing that a PD dedication tries to, but is generally considered to be fairly "bullet proof".

    I call "public domain" any work that is either:
    * not copyrighted
    * licensed w/out any restrictions
    regardless of how strong is the legalese of the license.
    Then you're using the term incorrectly. Licensing something without any restrictions doesn't put it in the public domain, though in practice the effect is similar. Public domain means the work is not copyrighted, and everything is copyrighted automatically these days (since the Berne Convention, IIRC) so public domain basically means the copyright has expired. In most (possibly all) jurisdictions there is no legal framework for placing something in the public domain other than waiting for it to expire, hence the problem with short public domain licenses. Saying "I place this in the public domain" doesn't mean it actually is in the public domain. Depending on jurisdiction, it may well be closer to saying "All rights reserved." (if a PD dedication has no effect, and copyright is automatic…)

    Yes, it does. Ambiguity is orthogonal to actual terms and PD terms are strictly less stringent.
    Let's assume for a second that a PD dedication has no effect; in that case the work is still copyrighted and you have not provided a license. If you don't have a license, what you can is determined by copyright law, which is extremely stringent. Basically, the only things you're allowed to do are things which fall into the "fair use" category.

    The ambiguity means there is a possibility that you're opening yourself up to significant liability, which is a pretty big burden.

    I didn't speak about how to do the PD dedication. Frankly, I don't advocate any choice as I see all of them as broken in some way. But this point made me interested. Are you aware of any open source developer that got sued by a user? The risk definitely is there, it would be best to quantify it, but it would be good to at least have *some* information on it.
    No, luckily I'm not. And I'm not about to volunteer to test it.

  42. #60
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by nemequ View Post
    You're oversimplifying. You can want multiple things, the question is where the priority lies. If your highest priority is that people share their improvements then you should probably choose the GPL.
    That's highly questionable. Ask Yann how are his contributions since he's switched to BSD.

    Quote Originally Posted by nemequ View Post
    If you have a better idea, tell them.
    Some time ago I've seen this in their bug tracker; the issue has been open basically since they launched. There are many possible changes proposed, but the staff ignored the problem entirely.

    Quote Originally Posted by nemequ View Post
    And a simple public domain dedication is known to be extremely weak. MIT, OTOH, is a short and simple license which achieves basically the same thing that a PD dedication tries to, but is generally considered to be fairly "bullet proof".
    (...)
    Then you're using the term incorrectly. Licensing something without any restrictions doesn't put it in the public domain, though in practice the effect is similar. Public domain means the work is not copyrighted, and everything is copyrighted automatically these days (since the Berne Convention, IIRC) so public domain basically means the copyright has expired. In most (possibly all) jurisdictions there is no legal framework for placing something in the public domain other than waiting for it to expire, hence the problem with short public domain licenses. Saying "I place this in the public domain" doesn't mean it actually is in the public domain. Depending on jurisdiction, it may well be closer to saying "All rights reserved." (if a PD dedication has no effect, and copyright is automatic…)

    Let's assume for a second that a PD dedication has no effect; in that case the work is still copyrighted and you have not provided a license. If you don't have a license, what you can is determined by copyright law, which is extremely stringent. Basically, the only things you're allowed to do are things which fall into the "fair use" category.

    The ambiguity means there is a possibility that you're opening yourself up to significant liability, which is a pretty big burden.
    Take MIT, remove restrictions and you have what is basically a simple PD dedication, every bit as strong legally as the MIT license.
    Yes, I use the term in a way that's not fully correct, yet it is:
    * understandable, to the extent needed by non-lawyers
    * in use by many people, yourself including (you called CC0 a PD dedication a few posts before, though in most countries it's a license)

    I'd like to see a court case where someone who has placed their work in public domain in a way that's not recognised in some country suing their users in that country for unlicensed use. Legally possible, but extremely contrived case and the contradiction between stated intent and actions may be the reason for the court to recognise an implied license. Or may not.

Page 2 of 4 FirstFirst 1234 LastLast

Similar Threads

  1. LZF - Optimized LZF compressor
    By encode in forum Data Compression
    Replies: 39
    Last Post: 28th March 2019, 20:49
  2. Optimized LZSS compressor
    By encode in forum Data Compression
    Replies: 11
    Last Post: 13th February 2014, 23:51
  3. M1 - Optimized demo coder
    By toffer in forum Data Compression
    Replies: 189
    Last Post: 22nd July 2010, 00:49
  4. lzop optimized compile
    By M4ST3R in forum Download Area
    Replies: 1
    Last Post: 30th June 2009, 22:31
  5. 7zip >> Sfx optimized - 23,7 kb
    By Yuri Grille. in forum Data Compression
    Replies: 22
    Last Post: 12th April 2009, 22:33

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •