¿Cómo dividir un conjunto de rangos superpuestos en rangos no superpuestos?

https://stackoverflow.com/questions/628837

07-07-2019
|

Pregunta

Digamos que tiene un conjunto de rangos:

0-100: 'a'
0 - 75: 'b'
95-150: 'c'
120 - 130: 'd'

Obviamente, estos rangos se superponen en ciertos puntos. ¿Cómo diseccionaría estos rangos para producir una lista de rangos no superpuestos, mientras conserva la información asociada con su rango original (en este caso, la letra después del rango)?

Por ejemplo, los resultados de lo anterior después de ejecutar el algoritmo serían:

0-75: 'a', 'b'
76 - 94: 'a'
95-100: 'a', 'c'
101-119: 'c'
120 - 130: 'c', 'd'
131 - 150: 'c'

Solución

Tenía la misma pregunta cuando escribía un programa para mezclar (parcialmente superpuesto) muestras de audio.

Lo que hice fue agregar un " evento de inicio " y "detener evento" (para cada elemento) a una lista, ordene la lista por punto de tiempo y luego procese en orden. Podría hacer lo mismo, excepto usar un punto entero en lugar de un tiempo, y en lugar de mezclar sonidos, estaría agregando símbolos al conjunto correspondiente a un rango. Ya sea que genere rangos vacíos o simplemente los omita, sería opcional.

Editar Quizás algún código ...

# input = list of (start, stop, symbol) tuples
points = [] # list of (offset, plus/minus, symbol) tuples
for start,stop,symbol in input:
    points.append((start,'+',symbol))
    points.append((stop,'-',symbol))
points.sort()

ranges = [] # output list of (start, stop, symbol_set) tuples
current_set = set()
last_start = None
for offset,pm,symbol in points:
    if pm == '+':
         if last_start is not None:
             #TODO avoid outputting empty or trivial ranges
             ranges.append((last_start,offset-1,current_set))
         current_set.add(symbol)
         last_start = offset
    elif pm == '-':
         # Getting a minus without a last_start is unpossible here, so not handled
         ranges.append((last_start,offset-1,current_set))
         current_set.remove(symbol)
         last_start = offset

# Finish off
if last_start is not None:
    ranges.append((last_start,offset-1,current_set))

Totalmente no probado, obviamente.

Otros consejos

Yo diría que cree una lista de los puntos finales y los ordene, también indexe la lista de rangos por puntos iniciales y finales. Luego, recorra la lista de puntos finales ordenados y, para cada uno, verifique los rangos para ver cuáles están comenzando / deteniéndose en ese punto.

Esto probablemente esté mejor representado en el código ... si sus rangos están representados por tuplas:

ranges = [(0,100,'a'),(0,75,'b'),(95,150,'c'),(120,130,'d')]
endpoints = sorted(list(set([r[0] for r in ranges] + [r[1] for r in ranges])))
start = {}
end = {}
for e in endpoints:
    start[e] = set()
    end[e] = set()
for r in ranges:
    start[r[0]].add(r[2])
    end[r[1]].add(r[2])
current_ranges = set()
for e1, e2 in zip(endpoints[:-1], endpoints[1:]):
    current_ranges.difference_update(end[e1])
    current_ranges.update(start[e1])
    print '%d - %d: %s' % (e1, e2, ','.join(current_ranges))

Aunque mirando esto en retrospectiva, me sorprendería si no hubiera una forma más eficiente (o al menos más limpia) de hacerlo.

Lo que describe es un ejemplo de teoría de conjuntos. Para ver un algoritmo general para calcular uniones, intersecciones y diferencias de conjuntos, consulte:

www.gvu.gatech.edu/~jarek/graphics /papers/04PolygonBooleansMargalit.pdf

Si bien el documento está dirigido a gráficos, también es aplicable a la teoría general de conjuntos. No es exactamente material de lectura ligero.

Pseudocódigo:

unusedRanges = [ (each of your ranges) ]
rangesInUse = []
usedRanges = []
beginningBoundary = nil

boundaries = [ list of all your ranges' start and end values, sorted ]
resultRanges = []

for (boundary in boundaries) {
    rangesStarting = []
    rangesEnding = []

    // determine which ranges begin at this boundary
    for (range in unusedRanges) {
        if (range.begin == boundary) {
            rangesStarting.add(range)
        }
    }

    // if there are any new ones, start a new range
    if (rangesStarting isn't empty) {
        if (beginningBoundary isn't nil) {
            // add the range we just passed
            resultRanges.add(beginningBoundary, boundary - 1, [collected values from rangesInUse])
        }

        // note that we are starting a new range
        beginningBoundary = boundary

        for (range in rangesStarting) {
            rangesInUse.add(range)
            unusedRanges.remove(range)
        }
    }

    // determine which ranges end at this boundary
    for (range in rangesInUse) {
        if (range.end == boundary) {
            rangesEnding.add(range)
        }
    }

    // if any boundaries are ending, stop the range
    if (rangesEnding isn't empty) {
        // add the range up to this boundary
        resultRanges.add(beginningBoundary, boundary, [collected values from rangesInUse]

        for (range in rangesEnding) {
            usedRanges.add(range)
            rangesInUse.remove(range)
        }

        if (rangesInUse isn't empty) {
            // some ranges didn't end; note that we are starting a new range
            beginningBoundary = boundary + 1
        }
        else {
            beginningBoundary = nil
        }
    }
}

Prueba unitaria:

Al final, resultRanges debería tener los resultados que está buscando, los rangos no utilizados y rangeInUse deben estar vacíos, beginBoundary debe ser nulo, y usedRanges debe contener lo que los rangos no utilizados solían contener (pero ordenados por range.end).

Una respuesta similar a Edmunds, probada, que incluye soporte para intervalos como (1,1):

class MultiSet(object):
    def __init__(self, intervals):
        self.intervals = intervals
        self.events = None

    def split_ranges(self):
        self.events = []
        for start, stop, symbol in self.intervals:
            self.events.append((start, True, stop, symbol))
            self.events.append((stop, False, start, symbol))

        def event_key(event):
            key_endpoint, key_is_start, key_other, _ = event
            key_order = 0 if key_is_start else 1
            return key_endpoint, key_order, key_other

        self.events.sort(key=event_key)

        current_set = set()
        ranges = []
        current_start = -1

        for endpoint, is_start, other, symbol in self.events:
            if is_start:
                if current_start != -1 and endpoint != current_start and \
                       endpoint - 1 >= current_start and current_set:
                    ranges.append((current_start, endpoint - 1, current_set.copy()))
                current_start = endpoint
                current_set.add(symbol)
            else:
                if current_start != -1 and endpoint >= current_start and current_set:
                    ranges.append((current_start, endpoint, current_set.copy()))
                current_set.remove(symbol)
                current_start = endpoint + 1

        return ranges


if __name__ == '__main__':
    intervals = [
        (0, 100, 'a'), (0, 75, 'b'), (75, 80, 'd'), (95, 150, 'c'), 
        (120, 130, 'd'), (160, 175, 'e'), (165, 180, 'a')
    ]
    multiset = MultiSet(intervals)
    pprint.pprint(multiset.split_ranges())


[(0, 74, {'b', 'a'}),
 (75, 75, {'d', 'b', 'a'}),
 (76, 80, {'d', 'a'}),
 (81, 94, {'a'}),
 (95, 100, {'c', 'a'}),
 (101, 119, {'c'}),
 (120, 130, {'d', 'c'}),
 (131, 150, {'c'}),
 (160, 164, {'e'}),
 (165, 175, {'e', 'a'}),
 (176, 180, {'a'})]

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow