Results 1 to 5 of 5

Thread: DZip

  1. #1
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    486
    Thanks
    169
    Thanked 166 Times in 114 Posts

    DZip

    I stumbled across another neural network based general purpose compressor today. They compare it against LSTM where it mostly does better (sometimes very much so) but is sometimes poorer. I haven't tried it yet.

    https://arxiv.org/abs/1911.03572

    https://github.com/mohit1997/DZip

  2. #2
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    74
    Thanks
    15
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by JamesB View Post
    I stumbled across another neural network based general purpose compressor today. They compare it against LSTM where it mostly does better (sometimes very much so) but is sometimes poorer. I haven't tried it yet.

    https://arxiv.org/abs/1911.03572

    https://github.com/mohit1997/DZip
    It seems like cmix

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,716
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    Not related to cmix. Its written in python and uses all the popular NN frameworks.
    Compression seems to be pretty bad though - they report worse results than bsc for enwik8.

  4. #4
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Interestingly, the improvements using the dynamic model are almost neglible on some files, although it is not clear to me if they keep updating the static model.

  5. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    486
    Thanks
    169
    Thanked 166 Times in 114 Posts
    The authors are heavily involved in various genome sequence data formats, so that's probably their primary focus here and why they have so much genomic data in their test corpus. So maybe they don't care so much about text compression. At the moment the tools (Harc, Spring) from some of the authors make heavy use of libbsc, so perhaps they're looking at replacing it. Albeit slowly...

    They'd probably be better off considering something like MCM for rapid CM encoding as a more tractable alternative, but it's always great to see new research of course and this team have a track record of interesting results.

  6. Thanks:

    Shelwien (6th December 2019)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •