Results 1 to 15 of 15

Thread: Learning compression

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts

    Learning compression

    How to learn compression almost from scratch, i.e. for a student with good research abilities? What books, online courses, videos or any other ways you can recommend? Chinese ones are especially welcomed.

    Another question is sources which can be used to learn compression algorithms. I.e. well-commented, optimized for simplicity and readability rather than efficiency.

    Books:

    Online books:
    • Matt Mahoney "Data Compression Explained" (2013) looks terse and comitted to practical compression methods
    • Gary Linscott "Modern LZ Compression" is great practical book teaching how to develop gzip-class compressor and providing dozens of links into more advanced topics

    Blogs:
    • Yann Collet "RealTime Data Compression" - lots of info on developement of xxHash, LZ4, FSE/Huff0 and ZSTD
    • Charles Bloom "cbloom rants" - hundreds of posts about complex compression topics, mainly *LZ and entropy coding
    • Fabien Giesen "The ryg blog" - great technical blog, but not much compression-related posts
    Last edited by Bulat Ziganshin; 24th April 2020 at 12:22.

  2. Thanks (4):

    Cyan (24th April 2020),Hakan Abbas (24th April 2020),introspec (23rd April 2020),JamesB (28th April 2020)

  3. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    176
    Thanks
    29
    Thanked 74 Times in 44 Posts
    I found this to be a great explanation if someone has a bit of programming knowledge and wants to learn a typical LZ coding stack https://glinscott.github.io/lz/index.html

  4. Thanks (2):

    Bulat Ziganshin (23rd April 2020),Hakan Abbas (24th April 2020)

  5. #3
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    261
    Thanked 242 Times in 121 Posts
    The wikibook on data compression gives a broad overview over both the basics, motivation and some algorithms together with many useful links. However as most wikibooks, it's not finished yet and quality fluctuates - though it's more complete than it's newer sibling, the wikibook on data coding theory.
    http://schnaader.info
    Damn kids. They're all alike.

  6. Thanks:

    Hakan Abbas (24th April 2020)

  7. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    Maybe https://www.amazon.cn/dp/B01IE1C9B4 and translations of other usual books?
    Also http://mattmahoney.net/dc/dce.html

  8. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts
    those wikibooks looks more like collections of random notes

    LZ book is great, thanks!

    DCE is too compressed, it looks more like a list of practical tricks for someone who already learned the compression theory

    I will add a list of books/blogs I found to the starting message, but learning from thick books is time-cosuming, so I look for other approaches

  9. #6
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    65
    Thanked 33 Times in 21 Posts
    I wish I knew a good answer to this question. There are a couple of textbooks that you probably know about already: Salomon's "Data compression: The complete reference" is big and unbalanced - sometimes too detailed, sometimes insufficiently detailed. Sayood's "Introduction to data compression" is more balanced in my opinion, but is also very thick. Many original papers are very difficult to approach.

    With my students, I usually try two approaches. For those who are more mathematically inclined, I start with Huffman's paper and then go towards the Shannon's work, probably re-told by someone else, the Kraft-McMillan inequality, and then towards arithmetic coding. This is quite neat mathematically, but a bit remote from practice. For those who are more practically inclined, I tend to pick a LZ77-style compressor like LZ4 or LZF, learn their operation and re-implement them (always decoder before encoder) and then look into deeper issues like cleverer match finding and optimal parsing. Usually I find that both do not fit into a bachelor's year project.

  10. Thanks:

    Bulat Ziganshin (23rd April 2020)

  11. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 257 Times in 160 Posts
    I have prepared semester lecture ~2years ago mainly about data compression, and indeed it was tough to get materials - for LZ, variable length Markov I used Matt's http://mattmahoney.net/dc/dce.html , but beside it was gathered information from many articles, also Wikipedia.

  12. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts
    introspec and Jaroslaw, what about https://glinscott.github.io/lz/index.html ? for me, it looks like great teaching material, especially taking into account full sources accompanied

  13. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    It looks good for LZ side (and even mentions your work), sure, but I think there's too little of text, and entropy coding is not there at all
    (Huffman coding doesn't count).

    Also I think its better to not ignore printed books.
    If you think they're too large, then just pick relevant parts from a few books.

    Alternatively, its possible to collect a set of useful papers (starting from "modeling for text compression" maybe),
    but this selection process would require much more effort comparing to books.

  14. #10
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 257 Times in 160 Posts
    Looks good, but is very basic - not even LZ77, LZ78 difference ... no practical information like how it is done in gzip, zstd, lzma ...

  15. #11
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    124
    Thanks
    106
    Thanked 71 Times in 51 Posts
    I saw this posted lobste.rs a while back, and thought it was quite well written introduction to the zip compression format, there are parts on LZ and Huffman coding, but perhaps a bit too much about deflate and zip specifics for your use.

    https://www.hanshq.net/zip.html

  16. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts
    Shelwien, of course lzbook can't replace more comprehensive course, but it may teach how to write deflate-class compressor - all in one place. It looks reasonably-sized for a student

    Disadvantage of thick books is that a student need to study a lot without practical results. Crash courses like this one limits themselves to topics required to develop a practical engine, thus improving motivation of a learner

    jibz, this book is more like detailed description how to implement zip+defalte decompressor rather than a book teaching compression
    Last edited by Bulat Ziganshin; 24th April 2020 at 12:05.

  17. #13
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    65
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    introspec and Jaroslaw, what about https://glinscott.github.io/lz/index.html ? for me, it looks like great teaching material, especially taking into account full sources accompanied
    It might work for a Computer Science project, I do not know, but I work in applied maths, and this book spends all the time saying WHAT to do, without ever explaining WHY things are done in the specified way. It simply won't work. Completely different level of explaining is needed if understanding is required at the end.

  18. Thanks (2):

    Bulat Ziganshin (24th April 2020),Shelwien (24th April 2020)

  19. #14
    Member
    Join Date
    Apr 2020
    Location
    Germany
    Posts
    2
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Quote Originally Posted by jibz View Post
    I saw this posted lobste.rs a while back, and thought it was quite well written introduction to the zip compression format, there are parts on LZ and Huffman coding, but perhaps a bit too much about deflate and zip specifics for your use.

    https://www.hanshq.net/zip.html
    jibz, this book is more like detailed description how to implement zip+defalte decompressor rather than a book teaching compression
    I'm happy to see that someone found my article It does cover both compression and decompression, but of course it's very deflate and zip specific.

  20. Thanks:

    compgt (3rd July 2020)

  21. #15
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    187
    Thanked 177 Times in 120 Posts
    I've got a copy of Mark Nelson's book on Data Compression somewhere. Was a good read at the time, but it doesn't cover a lot more modern ideas.

Similar Threads

  1. Replies: 33
    Last Post: 25th May 2019, 20:38
  2. RAISR - image compression with machine learning by Google
    By willvarfar in forum Data Compression
    Replies: 1
    Last Post: 13th January 2017, 19:59
  3. Srez - image super-resolution through deep learning
    By Sportman in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 28th August 2016, 23:17
  4. Machine Learning to identify weight loss parameters
    By Sportman in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th August 2016, 13:10

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •