algorithm to find longest non-overlapping sequences

https://stackoverflow.com/questions/4593583

15-10-2019
|

Question

I am trying to find the best way to solve the following problem. By best way I mean less complex.

As an input a list of tuples (start,length) such:

[(0,5),(0,1),(1,9),(5,5),(5,7),(10,1)]

Each element represets a sequence by its start and length, for example (5,7) is equivalent to the sequence (5,6,7,8,9,10,11) - a list of 7 elements starting with 5. One can assume that the tuples are sorted by the start element.

The output should return a non-overlapping combination of tuples that represent the longest continuous sequences(s). This means that, a solution is a subset of ranges with no overlaps and no gaps and is the longest possible - there could be more than one though.

For example for the given input the solution is:

[(0,5),(5,7)] equivalent to (0,1,2,3,4,5,6,7,8,9,10,11)

is it backtracking the best approach to solve this problem ?

I'm interested in any different approaches that people could suggest.

Also if anyone knows a formal reference of this problem or another one that is similar I'd like to get references.

BTW - this is not homework.

Edit

Just to avoid some mistakes this is another example of expected behaviour

for an input like [(0,1),(1,7),(3,20),(8,5)] the right answer is [(3,20)] equivalent to (3,4,5,..,22) with length 20. Some of the answers received would give [(0,1),(1,7),(8,5)] equivalent to (0,1,2,...,11,12) as right answer. But this last answer is not correct because is shorter than [(3,20)].

Solution

Iterate over the list of tuples using the given ordering (by start element), while using a hashmap to keep track of the length of the longest continuous sequence ending on a certain index.

pseudo-code, skipping details like items not found in a hashmap (assume 0 returned if not found):

int bestEnd = 0;
hashmap<int,int> seq // seq[key] = length of the longest sequence ending on key-1, or 0 if not found
foreach (tuple in orderedTuples) {
    int seqLength = seq[tuple.start] + tuple.length
    int tupleEnd = tuple.start+tuple.length;
    seq[tupleEnd] = max(seq[tupleEnd], seqLength)
    if (seqLength > seq[bestEnd]) bestEnd = tupleEnd
}
return new tuple(bestEnd-seq[bestEnd], seq[bestEnd])

This is an O(N) algorithm.

If you need the actual tuples making up this sequence, you'd need to keep a linked list of tuples hashed by end index as well, updating this whenever the max length is updated for this end-point.

UPDATE: My knowledge of python is rather limited, but based on the python code you pasted, I created this code that returns the actual sequence instead of just the length:

def get_longest(arr):
    bestEnd = 0;
    seqLengths = dict() #seqLengths[key] = length of the longest sequence ending on key-1, or 0 if not found
    seqTuples = dict() #seqTuples[key] = the last tuple used in this longest sequence
    for t in arr:
        seqLength = seqLengths.get(t[0],0) + t[1]
        tupleEnd = t[0] + t[1]
        if (seqLength > seqLengths.get(tupleEnd,0)):
            seqLengths[tupleEnd] = seqLength
            seqTuples[tupleEnd] = t
            if seqLength > seqLengths.get(bestEnd,0):
                bestEnd = tupleEnd
    longestSeq = []
    while (bestEnd in seqTuples):
        longestSeq.append(seqTuples[bestEnd])
        bestEnd -= seqTuples[bestEnd][1]
    longestSeq.reverse()
    return longestSeq


if __name__ == "__main__":
    a = [(0,3),(1,4),(1,1),(1,8),(5,2),(5,5),(5,6),(10,2)]
    print(get_longest(a))

OTHER TIPS

Revised algorithm:

create a hashtable of start->list of tuples that start there
put all tuples in a queue of tupleSets
set the longestTupleSet to the first tuple
while the queue is not empty
    take tupleSet from the queue
    if any tuples start where the tupleSet ends
        foreach tuple that starts where the tupleSet ends
            enqueue new tupleSet of tupleSet + tuple
        continue

    if tupleSet is longer than longestTupleSet
        replace longestTupleSet with tupleSet

return longestTupleSet

c# implementation

public static IList<Pair<int, int>> FindLongestNonOverlappingRangeSet(IList<Pair<int, int>> input)
{
    var rangeStarts = input.ToLookup(x => x.First, x => x);
    var adjacentTuples = new Queue<List<Pair<int, int>>>(
        input.Select(x => new List<Pair<int, int>>
            {
                x
            }));

    var longest = new List<Pair<int, int>>
        {
            input[0]
        };
    int longestLength = input[0].Second - input[0].First;

    while (adjacentTuples.Count > 0)
    {
        var tupleSet = adjacentTuples.Dequeue();
        var last = tupleSet.Last();
        int end = last.First + last.Second;
        var sameStart = rangeStarts[end];
        if (sameStart.Any())
        {
            foreach (var nextTuple in sameStart)
            {
                adjacentTuples.Enqueue(tupleSet.Concat(new[] { nextTuple }).ToList());
            }
            continue;
        }
        int length = end - tupleSet.First().First;
        if (length > longestLength)
        {
            longestLength = length;
            longest = tupleSet;
        }
    }

    return longest;
}

tests:

[Test]
public void Given_the_first_problem_sample()
{
    var input = new[]
        {
            new Pair<int, int>(0, 5),
            new Pair<int, int>(0, 1),
            new Pair<int, int>(1, 9),
            new Pair<int, int>(5, 5),
            new Pair<int, int>(5, 7),
            new Pair<int, int>(10, 1)
        };
    var result = FindLongestNonOverlappingRangeSet(input);
    result.Count.ShouldBeEqualTo(2);
    result.First().ShouldBeSameInstanceAs(input[0]);
    result.Last().ShouldBeSameInstanceAs(input[4]);
}

[Test]
public void Given_the_second_problem_sample()
{
    var input = new[]
        {
            new Pair<int, int>(0, 1),
            new Pair<int, int>(1, 7),
            new Pair<int, int>(3, 20),
            new Pair<int, int>(8, 5)
        };
    var result = FindLongestNonOverlappingRangeSet(input);
    result.Count.ShouldBeEqualTo(1);
    result.First().ShouldBeSameInstanceAs(input[2]);
}

This is a special case of the longest path problem for weighted directed acyclic graphs.

The nodes in the graph are the start points and the points after the last element in a sequence, where the next sequence could start.

The problem is special because the distance between two nodes must be the same independently of the path.

Just thinking about the algorithm in basic terms, would this work?

(apologies for horrible syntax but I'm trying to stay language-independent here)

First the simplest form: Find the longest contiguous pair.

Cycle through every member and compare it to every other member with a higher startpos. If the startpos of the second member is equal to the sum of the startpos and length of the first member, they are contiguous. If so, form a new member in a new set with the lower startpos and combined length to represent this.

Then, take each of these pairs and compare them to all of the single members with a higher startpos and repeat, forming a new set of contiguous triples (if any exist).

Continue this pattern until you have no new sets.

The tricky part then is you have to compare the length of every member of each of your sets to find the real longest chain.

I'm pretty sure this is not as efficient as other methods, but I believe this is a viable approach to brute forcing this solution.

I'd appreciate feedback on this and any errors I may have overlooked.

Edited to replace pseudocode with actual Python code

Edited AGAIN to change the code; The original algorithm was on the solution, but I missunderstood what the second value in the pairs was! Fortunatelly the basic algorithm is the same, and I was able to change it.

Here's an idea that solves the problem in O(N log N) and doesn't use a hash map (so no hidden times). For memory we're going to use N * 2 "things".

We're going to add two more values to each tuple: (BackCount, BackLink). In the successful combination BackLink will link from right to left from the right-most tuple to the left-most tuple. BackCount will be the value accumulated count for the given BackLink.

Here's some python code:

def FindTuplesStartingWith(tuples, frm):
    # The Log(N) algorithm is left as an excersise for the user
    ret=[]
    for i in range(len(tuples)):
        if (tuples[i][0]==frm): ret.append(i)
    return ret

def FindLongestSequence(tuples):

    # Prepare (BackCount, BackLink) array
    bb=[] # (BackCount, BackLink)
    for OneTuple in tuples: bb.append((-1,-1))

    # Prepare
    LongestSequenceLen=-1
    LongestSequenceTail=-1

    # Algorithm
    for i in range(len(tuples)):
        if (bb[i][0] == -1): bb[i] = (0, bb[i][1])
        # Is this single pair the longest possible pair all by itself?
        if (tuples[i][1] + bb[i][0]) > LongestSequenceLen:
            LongestSequenceLen = tuples[i][1] + bb[i][0]
            LongestSequenceTail = i
        # Find next segment
        for j in FindTuplesStartingWith(tuples, tuples[i][0] + tuples[i][1]):
            if ((bb[j][0] == -1) or (bb[j][0] < (bb[i][0] + tuples[i][1]))):
                # can be linked
                bb[j] = (bb[i][0] + tuples[i][1], i)
                if ((bb[j][0] + tuples[j][1]) > LongestSequenceLen):
                    LongestSequenceLen = bb[j][0] + tuples[j][1]
                    LongestSequenceTail=j

    # Done! I'll now build up the solution
    ret=[]
    while (LongestSequenceTail > -1):
        ret.insert(0, tuples[LongestSequenceTail])
        LongestSequenceTail = bb[LongestSequenceTail][1]
    return ret

# Call the algoritm
print FindLongestSequence([(0,5), (0,1), (1,9), (5,5), (5,7), (10,1)])
>>>>>> [(0, 5), (5, 7)]
print FindLongestSequence([(0,1), (1,7), (3,20), (8,5)])    
>>>>>> [(3, 20)]

The key for the whole algorithm is where the "THIS IS THE KEY" comment is in the code. We know our current StartTuple can be linked to EndTuple. If a longer sequence that ends at EndTuple.To exists, it was found by the time we got to this point, because it had to start at an smaller StartTuple.From, and the array is sorted on "From"!

I removed the previous solution because it was not tested.

The problem is finding the longest path in a "weighted directed acyclic graph", it can be solved in linear time:

http://en.wikipedia.org/wiki/Longest_path_problem#Weighted_directed_acyclic_graphs

Put a set of {start positions} union {(start position + end position)} as vertices. For your example it would be {0, 1, 5, 10, 11, 12}

for vertices v0, v1 if there is an end value w that makes v0 + w = v1, then add a directed edge connecting v0 to v1 and put w as its weight.

Now follow the pseudocode in the wikipedia page. since the number of vertices is the maximum value of 2xn (n is number of tuples), the problem can still be solved in linear time.

This is a simple reduce operation. Given a pair of consecutive tuples, they either can or can't be combined. So define the pairwise combination function:

def combo(first,second):
    if first[0]+first[1] == second[0]:
        return [(first[0],first[1]+second[1])]
    else:
        return [first,second]

This just returns a list of either one element combining the two arguments, or the original two elements.

Then define a function to iterate over the first list and combine pairs:

def collapse(tupleList):
    first = tupleList.pop(0)
    newList = []
    for item in tupleList:
        collapsed = combo(first,item)
        if len(collapsed)==2:
            newList.append(collapsed[0])
        first = collapsed.pop()
    newList.append(first)
    return newList

This keeps a first element to compare with the current item in the list (starting at the second item), and when it can't combine them it drops the first into a new list and replaces first with the second of the two.

Then just call collapse with the list of tuples:

>>> collapse( [(5, 7), (12, 3), (0, 5), (0, 7), (7, 2), (9, 3)] )
[(5, 10), (0, 5), (0, 12)]

[Edit] Finally, iterate over the result to get the longest sequence.

def longest(seqs):
    collapsed = collapse(seqs)
    return max(collapsed, key=lambda x: x[1])

[/Edit]

Complexity O(N). For bonus marks, do it in reverse so that the initial pop(0) becomes a pop() and you don't have to reindex the array, or move the iterator instead. For top marks make it run as a pairwise reduce operation for multithreaded goodness.

This sounds like a perfect "dynamic programming" problem...

The simplest program would be to do it brute force (e.g. recursive), but this has exponential complexity.

With dynamic programming you can set up an array a of length n, where n is the maximum of all (start+length) values of your problem, where a[i] denotes the longest non-overlapping sequence up to a[i]. You can then step trought all tuples, updating a. The complexity of this algorithm would be O(n*k), where k is the number of input values.

Create an ordered array of all start and end points and initialise all of them to one
For each item in your tuple, compare the end point (start and end) to the ordered items in your array, if any point is between them (e.g. point in the array is 5 and you have start 2 with length 4) change value to zero.
After finishing the loop, start moving across the ordered array and create a strip when you see 1 and while you see 1, add to the existing strip, with any zero, close the strip and etc.
At the end check the length of strips

I think complexity is around O(4-5*N)

(SEE UPDATE)

with N being number of items in the tuple.

UPDATE

As you figured out, the complexity is not accurate but definitely very small since it is a function of number of line stretches (tuple items).

So if N is number of line stretches, sorting is O(2N * log2N). Comparison is O(2N). Finding line stretches is also O(2N). So all in all O(2N(log2N + 2)).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow