1. ## Combinatorics problem

This isn't really off-topic, but it doesn't need much discussion, either.

The combinatorics problem is:

Let W be an arbitrary word of length N over an alphabet S.

Define a factor of W as a pair (i, j) s.t. 1<= i < j <= N.

The number of factors of W is N choose 2 = (N! / ((N-2)! * 2!))

Define a substring M of W as a contiguous sequence of symbols from W s.t. |M| <= |W|, or, equivalently, a subsequence of W whose start point and end point are given by a factor of W.

Note that the number of substrings can be smaller than the number of factors, due to repeats.

What is the maximum number of substrings for a word of length N and alphabet S?

=================

I think I solved this for binary alphabets, and that pretty much satisfies me so far.

The binary words with the maximum number of substrings are probably De Bruijn sequences, which contain a different binary sequence of length lg N starting at every position. N naturally is a power of 2.

Ignoring words of lengths other than powers of 2, if a word is a De Bruijn sequence, then every start position i is the start of (N - i - lg N) unique substrings. So the number of unique substrings of length at least lg N is (N - lg N) choose 2.

For substrings of length less than lg N, every bit sequence will appear at least once, so the total number is 2^0 + 2^1 + ... + 2^(lg N - 1) = N

So the number of substrings is ((N - lg N) choose 2) + N, which is O(N^2) .... (= close enough for CS)

===================

I didn't spend much time checking the math, but it sounds good to me, and O(N^2) is the part I care about.

Feel free to post solutions to other cases or correct my solution for the binary case.  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•