Question


I have a large list of numbers, and I want to see if any of them are approximately equal. If 2 numbers are "approximately equal" (for my purposes), both of them fall within 10% of each other (see the following 2 examples.) Then I want to sort them into separate lists of approximately equal numbers.

Example #1 Compare 5.0 and 5.5: 5.5 +/- 10% = 4.95 to 6.05 (and 5.0 is in this range) 5.0 +/- 10% = 4.50 to 5.50 (and 5.5 is in this range) Therefore, 5.0 and 5.5 are approximately equal.

Example #2 Compare 5.0 and 5.6: 5.6 +/- 10% = 5.04 to 6.16 (and 5.0 is in this range) 5.0 +/- 10% = 4.50 to 5.50 (and 5.6 is in NOT this range) Therefore, 5.0 and 5.6 are NOT approximately equal.

Summary of what I need to do: Input = {4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3} Desired output = {4.0, 4.1, 4.2, 4.0, 4.3} and {9.0, 9.4, 8.9}

Was it helpful?

Solution

input_list = [4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3]

results = {input_list[0]: [input_list[0]]}    # Start with first value
for value in input_list[1:]:         # loop through our entire list after first value
    hi = value * 1.1
    low = value * 0.9
    print("Value: {0}\tHi: {1}\tLow:{2}".format(value, hi, low))
    for existing in results:     # search through our result set
        found_similar = False
        if low < existing < hi:  # if we find a match
            results[existing].append(value)    # we add our value to the list for that set
            found_similar = True
            break
    if not found_similar:        # if we looped through our entire results without a match
        results[value] = [value] # Create a new entry in our results dictionary

for entry in results:
    print(results[entry])

Will give:

results = { 9.0: [9.0, 9.4, 8.9],
            4.0: [4.0, 4.1, 4.2, 4.0, 4.3] }

This code starts with the first value in your list, and finds all subsequent values that are within 10% of that one. So in your example, it starts with 4, and finds all similar values. Any value that isn't within 10 % get added to a new "set".

So once it reaches 9.0, it sees that it's not a match, so it adds a new result set to the results dictionary, with a key of 9.0. Now when it considers 9.4, it doesn't find a match in the 4.0 list, but it does find a match in the 9.0 list. So it adds this value to the second result set.

OTHER TIPS

Here is a generator / set based method.

def set_gen(nums):
    for seed in sorted(nums):
        yield tuple([n for n in nums if seed <= n and n/seed <= 1.1])

def remove_subsets(sets):
    for s in sets.copy():
        [sets.remove(s2) for s2 in sets.difference([s]) if set(s2).issubset(s)]

>>> nums = [4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3]
>>> x = set(num for num in set_gen(nums))
>>> remove_subsets(x)
>>> list(x)
[(9.0, 9.4, 8.9), (4.0, 4.1, 4.2, 4.0, 4.3)]

>>> nums =  [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
>>> x = set(num for num in set_gen(nums))
>>> remove_subsets(x)
>>> list(x)
[(1.9, 1.8), (1.5, 1.4), (1.4, 1.3), (1.2, 1.1), (1.7, 1.6), (1.5, 1.6), (1.3, 1.2), (1.9, 2.0), (1.0, 1.1), (1.8, 1.7)]

You could do this:

Input = {4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3} 

wl=sorted(Input,reverse=True)
apr=.1
out={}
while wl:
    wn=wl.pop()
    out[wn]=[wn]
    while wl and wl[-1]<=wn*(1+apr):
        out[wn].append(wl.pop())

print [(k,out[k]) for k in sorted(out.keys())]

Prints:

[(4.0, [4.0, 4.1, 4.2, 4.3]), (8.9, [8.9, 9.0, 9.4])]

Trying the example in the comments:

>>> Input = {1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0}

Prints:

[(1.0, [1.0, 1.1]), (1.2, [1.2, 1.3]), (1.4, [1.4, 1.5]), (1.6, [1.6, 1.7]), (1.8, [1.8, 1.9]), (2.0, [2.0])]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top