Among the good candidates to be applied to biological sequences, Longest-

First, RePair and Greedy behave very similarly: they are all off-line, and

they iterate over the grammar that is being built, selecting in each iteration a

repeat that maximizes a score function. In order to compare in a uniform framework

the behavior of the different score functions, we implemented them in a

general schema that we called IRR (for Iteratively Repeat Replacement). First,

the grammar being built is initialized with a unique initial rule S -> s where

s is the input sequence, and then IRR proceeds iteratively. At each iteration,

a word w occurring more than once in the grammar is chosen according to a

score function f. All the (non-overlapping) occurrences of w in the grammar are

replaced by a new non-terminal N and a new rewriting rule N -> w is added to

the grammar.

The choice of the score function instantiates different algorithms. As we

mentioned, the most popular ones are maximal length (ML), most frequent

(MF), and most compressive (MC) which chose the repeat that is longest, most

frequent or compresses best the grammar respectively. The three corresponding

algorithms are called IRR-ML, IRR-MF, IRR-MC. See [10] for details.