Results 1 to 1 of 1

Thread: Calgary Corpus

  1. #1

    Join Date
    May 2008
    Tristan da Cunha
    Thanked 4 Times in 4 Posts

    Calgary Corpus

    The Calgary Corpus is the most referenced corpus in the data compression field exspecially for text compression and is the de facto standard for lossless compression evaluation. The corpus was founded in 1987 by Ian Witten, Timothy Bell and John Cleary for their research paper "MODELING FOR TEXT COMPRESSION" at the University of Calgary, Canada. The research paper was published in 1989 at ACM Computing Surveys. In 1990 the corpus was used in their book "Text compression".

    The corpus consists of 18 files (Large Calgary Corpus: bib, book1, book2, geo, news, obj1, obj2, paper1, paper2, paper3, paper4, paper5, paper6, pic, progc, progl, progp and trans), but only 14 files were used in the paper and book (Standard Calgary Corpus: all files except paper3, paper4, paper5 and paper6).

    The corpus is available below:
    Attached Files Attached Files

Similar Threads

  1. Encode's Compression Corpus (EncCC)
    By encode in forum Download Area
    Replies: 5
    Last Post: 21st December 2017, 13:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts