Let’s say we have 3 cyclic sequences, each of them of length n and containing every number from the interval 1..n once.
The goal is to align the sequences in order to maximize the number of matches between the sequences. By a match I mean a column that contains 3 identical values. Since the sequences are cyclical, we can move each sequence cyclically by any number of positions (so 1->2->3 may become 2->3->1 and 3->1->2 but not 1->3->2).
Example:
1 5 4 3 2
1 3 2 4 5
2 1 5 4 3
The answer should be 3 and the corresponding alignment is shown below (fives, threes and twos are matched).
1 5 4 3 2
4 5 1 3 2
1 5 4 3 2
A simple quadratic algorithm I came up with is:
for i in 1..n:
align the sequences so that column (i, i, i) is created
count the number of matches
if larger than current maximum, then set new maximum
Unfortunately the sequences may have length 1000000 and thus my approach is not fast enough. I would appreciate I someone could suggest a subquadratic (linear, loglinear?) algorithm for this problem.
First, to simplify the problem, renumber such that the first sequence is 1..n. The problem is then reduced to finding the shift offsets for two sequences such that for as many numbers i as possible both have an i in position i.
Then, create an associative map that maps from pairs of shift offsets to the score. For each number, increase the score for the pair of offsets that would yield a match. Finally, find the highest score in the map.
This runs in expected linear time if you use a hash map.
0