Come dividere un set di intervalli sovrapposti in intervalli non sovrapposti?

https://stackoverflow.com/questions/628837

07-07-2019
|

Domanda

Supponiamo che tu abbia una serie di intervalli:

0 - 100: 'a'
0 - 75: 'b'
95-150: 'c'
120-130: 'd'

Ovviamente, questi intervalli si sovrappongono in determinati punti. Come sezionereste questi intervalli per produrre un elenco di intervalli non sovrapposti, mantenendo le informazioni associate al loro intervallo originale (in questo caso, la lettera dopo l'intervallo)?

Ad esempio, i risultati di cui sopra dopo aver eseguito l'algoritmo sarebbero:

0 - 75: 'a', 'b'
76 - 94: 'a'
95 - 100: 'a', 'c'
101-119: 'c'
120-130: 'c', 'd'
131 - 150: 'c'

Soluzione

Ho avuto la stessa domanda quando scrivevo un programma per mixare campioni audio (parzialmente sovrapposti).

Quello che ho fatto è stato aggiungere un "inizio evento" e "stop event" (per ciascun elemento) in un elenco, ordina l'elenco per punto temporale, quindi elaboralo in ordine. Potresti fare lo stesso, tranne usare un punto intero anziché un tempo, e invece di mescolare i suoni aggiungeresti simboli all'insieme corrispondente a un intervallo. Se generassi intervalli vuoti o li ometti, sarebbe facoltativo.

Modifica Forse un po 'di codice ...

# input = list of (start, stop, symbol) tuples
points = [] # list of (offset, plus/minus, symbol) tuples
for start,stop,symbol in input:
    points.append((start,'+',symbol))
    points.append((stop,'-',symbol))
points.sort()

ranges = [] # output list of (start, stop, symbol_set) tuples
current_set = set()
last_start = None
for offset,pm,symbol in points:
    if pm == '+':
         if last_start is not None:
             #TODO avoid outputting empty or trivial ranges
             ranges.append((last_start,offset-1,current_set))
         current_set.add(symbol)
         last_start = offset
    elif pm == '-':
         # Getting a minus without a last_start is unpossible here, so not handled
         ranges.append((last_start,offset-1,current_set))
         current_set.remove(symbol)
         last_start = offset

# Finish off
if last_start is not None:
    ranges.append((last_start,offset-1,current_set))

Totalmente non testato, ovviamente.

Altri suggerimenti

Direi di creare un elenco degli endpoint e di ordinarlo, anche di indicizzare l'elenco degli intervalli iniziando e terminando i punti. Quindi scorrere l'elenco degli endpoint ordinati e, per ognuno, controllare gli intervalli per vedere quali stanno iniziando / fermandosi in quel punto.

Questo è probabilmente meglio rappresentato nel codice ... se i tuoi intervalli sono rappresentati da tuple:

ranges = [(0,100,'a'),(0,75,'b'),(95,150,'c'),(120,130,'d')]
endpoints = sorted(list(set([r[0] for r in ranges] + [r[1] for r in ranges])))
start = {}
end = {}
for e in endpoints:
    start[e] = set()
    end[e] = set()
for r in ranges:
    start[r[0]].add(r[2])
    end[r[1]].add(r[2])
current_ranges = set()
for e1, e2 in zip(endpoints[:-1], endpoints[1:]):
    current_ranges.difference_update(end[e1])
    current_ranges.update(start[e1])
    print '%d - %d: %s' % (e1, e2, ','.join(current_ranges))

Anche se guardo questo a posteriori, sarei sorpreso se non ci fosse un modo più efficiente (o almeno di aspetto più pulito) per farlo.

Quello che descrivi è un esempio di teoria degli insiemi. Per un algoritmo generale per il calcolo di unioni, intersezioni e differenze di insiemi, consultare:

www.gvu.gatech.edu/~jarek/graphics /papers/04PolygonBooleansMargalit.pdf

Sebbene il documento sia mirato alla grafica, è applicabile anche alla teoria degli insiemi generali. Materiale di lettura non esattamente leggero.

Pseudocodice:

unusedRanges = [ (each of your ranges) ]
rangesInUse = []
usedRanges = []
beginningBoundary = nil

boundaries = [ list of all your ranges' start and end values, sorted ]
resultRanges = []

for (boundary in boundaries) {
    rangesStarting = []
    rangesEnding = []

    // determine which ranges begin at this boundary
    for (range in unusedRanges) {
        if (range.begin == boundary) {
            rangesStarting.add(range)
        }
    }

    // if there are any new ones, start a new range
    if (rangesStarting isn't empty) {
        if (beginningBoundary isn't nil) {
            // add the range we just passed
            resultRanges.add(beginningBoundary, boundary - 1, [collected values from rangesInUse])
        }

        // note that we are starting a new range
        beginningBoundary = boundary

        for (range in rangesStarting) {
            rangesInUse.add(range)
            unusedRanges.remove(range)
        }
    }

    // determine which ranges end at this boundary
    for (range in rangesInUse) {
        if (range.end == boundary) {
            rangesEnding.add(range)
        }
    }

    // if any boundaries are ending, stop the range
    if (rangesEnding isn't empty) {
        // add the range up to this boundary
        resultRanges.add(beginningBoundary, boundary, [collected values from rangesInUse]

        for (range in rangesEnding) {
            usedRanges.add(range)
            rangesInUse.remove(range)
        }

        if (rangesInUse isn't empty) {
            // some ranges didn't end; note that we are starting a new range
            beginningBoundary = boundary + 1
        }
        else {
            beginningBoundary = nil
        }
    }
}

Test unitario:

Alla fine, resultRanges dovrebbe avere i risultati che stai cercando, unusedRanges e rangeInUse dovrebbe essere vuoto, a partire daBoundary dovrebbe essere nullo e usedRanges dovrebbe contenere ciò che unusedRanges usato per contenere (ma ordinato per range.end).

Una risposta simile a Edmunds, testata, incluso il supporto per intervalli come (1,1):

class MultiSet(object):
    def __init__(self, intervals):
        self.intervals = intervals
        self.events = None

    def split_ranges(self):
        self.events = []
        for start, stop, symbol in self.intervals:
            self.events.append((start, True, stop, symbol))
            self.events.append((stop, False, start, symbol))

        def event_key(event):
            key_endpoint, key_is_start, key_other, _ = event
            key_order = 0 if key_is_start else 1
            return key_endpoint, key_order, key_other

        self.events.sort(key=event_key)

        current_set = set()
        ranges = []
        current_start = -1

        for endpoint, is_start, other, symbol in self.events:
            if is_start:
                if current_start != -1 and endpoint != current_start and \
                       endpoint - 1 >= current_start and current_set:
                    ranges.append((current_start, endpoint - 1, current_set.copy()))
                current_start = endpoint
                current_set.add(symbol)
            else:
                if current_start != -1 and endpoint >= current_start and current_set:
                    ranges.append((current_start, endpoint, current_set.copy()))
                current_set.remove(symbol)
                current_start = endpoint + 1

        return ranges


if __name__ == '__main__':
    intervals = [
        (0, 100, 'a'), (0, 75, 'b'), (75, 80, 'd'), (95, 150, 'c'), 
        (120, 130, 'd'), (160, 175, 'e'), (165, 180, 'a')
    ]
    multiset = MultiSet(intervals)
    pprint.pprint(multiset.split_ranges())


[(0, 74, {'b', 'a'}),
 (75, 75, {'d', 'b', 'a'}),
 (76, 80, {'d', 'a'}),
 (81, 94, {'a'}),
 (95, 100, {'c', 'a'}),
 (101, 119, {'c'}),
 (120, 130, {'d', 'c'}),
 (131, 150, {'c'}),
 (160, 164, {'e'}),
 (165, 175, {'e', 'a'}),
 (176, 180, {'a'})]

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow