This is what I mean by "Once a common phrase gets paired one way with something, all the other symbols on the same end tend to get promoted because the Markov probability of those strings tends to go down while the instances stays the same".

Suppose the phrase " of the" was only preceded by " many", " some" or " one" and only followed by " best", " worst", " first" and " last". Further, suppose the best combination found that includes " of the" was " of the first". When " of the first" is deduplicated, the probability of the symbol representing " of the" decreases because many instances were deduplicated when substituting the symbol representing " of the first" (as well as the probability of " first"). The counts for " many of the", " some of the" and " one of the" all tend to decrease roughly in proportion to the decrease in the number of symbols representing " of the" so the Markov chain probability tends to not change much (with exceptions) and the ratio of matches to the Markov chain probability tends to not change a lot. However, the counts for " of the best", " of the worst" and " of the last" are not changed by deduplicating " of the first", so the ratio of matches to the Markov chain probability goes up, making deduplication of those strings more profitable.

It is not perfect, but I think it helps. It doesn't work as well if there are correlations between the symbols preceding " of the" and following " of the", but if the correlation is strong then the whole combination will typically get deduplicated first instead.