Results 1 to 4 of 4

Thread: How mailinator compresses email by 90%

  1. #1
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts

    How mailinator compresses email by 90%

    general interest, and a nice write-up:

    http://mailinator.blogspot.com/2012/...ail-by-90.html

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,422
    Thanks
    222
    Thanked 1,051 Times in 564 Posts
    Thanks.
    Btw, this
    Code:
    lzma.exe e enwik8 enwik8.lzma -d23 -a1 -fb5 -mc3 -lc8 -lp0 -pb0 -mfbt4 -mt1
    compresses enwik8 to 35,687,133 in 13.359s, ie 7.485MB/s
    So its not necessarily always as slow as with default settings.

    Also do you understand that thing? http://en.wikipedia.org/wiki/Locality-sensitive_hashing
    Somehow all easy to find descriptions are really abstract, so I still don't get whether its really useful.

  3. #3
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I imagine you're asking the cleverer members and not me directly?

    I too saw LSH mentioned in the comments and googled it.

    In the context of mailinator hashmap/linked-list-LRU I'd have thought collisions in the hashmap should be avoided, not encouraged

    LSH seems to have more application in picking a model from a heirarchy for a string, perhaps? I'm thinking more in terms of general compression and modelling now; imagine a model that used an LSH of a prefix to pick a local model to use for the suffix? Useful?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,422
    Thanks
    222
    Thanked 1,051 Times in 564 Posts
    > I imagine you're asking the cleverer members and not me directly?

    Actually you're probably the most qualified person on this forum,
    to answer such questions anyway.
    Compression questions aside, it doesn't look like there're much
    people with good CS background here.

    > In the context of mailinator hashmap/linked-list-LRU I'd have
    > thought collisions in the hashmap should be avoided, not encouraged

    There was the part where he tried diffing messages with LCS.
    It seems to be the perfect kind of task for LSH - just finding
    the near-match candidates by a hashtable lookup, instead of
    applying LCS to all pairs of messages.

    > LSH seems to have more application in picking a model from a
    > heirarchy for a string, perhaps?

    Quote Originally Posted by wiki
    "The basic idea is to hash the input items so that similar items are
    mapped to the same buckets with high probability"
    In other words, its supposed to be exactly what we need for fuzzy
    matching, should be very useful for compression if its actually
    practical.

    > Useful?

    From the abstract descriptions which I browsed,
    it seems like its possible to automatically tune a hash function
    for given data.
    Also there were some interesting reduction functions.
    So I'd like to understand how its actually used.

    As to "useful", I already stumbled like this on bloom filters before.
    It seemed like a good idea, but how it would work if applied to
    context lookups in a compressor? Small bitmasks would just fill up
    really fast, and it won't matter how many there are.

Similar Threads

  1. how the brain compresses images
    By willvarfar in forum Data Compression
    Replies: 12
    Last Post: 13th February 2011, 17:32
  2. Fixed email sending problems!
    By encode in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 5th May 2008, 16:42

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •