This DNA corpus is referenced in several grammar compression papers, but is hard to find. Even then, each file needs to be decompressed (ever so slightly) with gzip and then run through dnau, for which I could only find source code. It consists of the following 11 files: chmpxx, chntxx, hehcmv, humdyst, humghcs, humhbb, humhdab, humprtb, mpomtcg, mtpacga, and vaccg. Here it is in a single .zip file.![]()