سؤال

I have two files that look like this with some differences between them:

First file:

 {16:[3, [-7, 87, 20, 32]]}
{17:[2, [-3, 88, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 28], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 85, 20, 32]]}
{20:[2, [9, 94, 16, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [2, 76, 19, 31]]}
{22:[2, [15, 97, 16, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 16, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 19, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 16, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}

Second file:

{16:[3, [-7, 86, 20, 32]]}
{17:[2, [-3, 82, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 27], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 84, 20, 32]]}
{20:[2, [9, 94, 15, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [1, 76, 19, 31]]}
{22:[2, [15, 97, 17, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 18, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 20, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 17, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}

I compare them both using difflib and print out the lines that have a difference in them. What i am trying to do is print out the minimum and maximum frame values that share the same id.

The frame is the key in every line so the frames in this case range from 16 to 26. The id is the value that preceeds every list of 4 values. So the id on the first line is 3. The second line has two id's which are 2 and then 3.

So an example of what i'd like to write out is:

17 - 36

given that one of the frames that share the id 3 is different than the file that i am comparing with.

For every difference like that, i need to write out a new file that only contains the start frame and the end frame, then i'll work on concatenating additional strings to each file.

this is the current difflib usage that prints out each line that has a different:

def compare(f1, f2):
    with open(f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
        diff = difflib.ndiff(fin1.readlines(), fin2.readlines())
        outcome = ''.join(x[2:] for x in diff if x.startswith('- '))
        print outcome

How would i be able to achieve what i described above with tweaking this execution block?

Note that both files share the same frame ammount but not the same ids so i would need to write two different files for each difference, possibly into a folder. So if the two files have 20 differences, i need to have two main folders one for each original file that each contain text files for every start and end frame of the same id.

هل كانت مفيدة؟

المحلول

Suppose your list of differences is the file content you give at the beginning of your post. I proceeded in 2 times, 1st get list of frames per id:

>>> from collections import defaultdict
>>> diffs = defaultdict(list)
>>> for line in s.split('\n'):
    d = eval(line) # We have a dict
    for k in d: # Only one value, k is the frame
        # Only get even values for ids
        for i in range(0, len(d[k]), 2):
            diffs[d[k][i]].append(k)


>>> diffs # We now have a dict with ids as keys :
defaultdict(<type 'list'>, {10: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 2: [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], 3: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 29: [31, 32, 33, 34, 35, 36]})

Now we get the ranges per id, thanks to this other SO post that helps getting the ranges from a list of indexes:

>>> from operator import itemgetter
>>> from itertools import groupby
>>> for id_ in diffs:
    diffs[id_].sort()
    for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
        group = map(itemgetter(1), g)
        print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])


id 10 : 23 -> 36
id 2 : 17 -> 33
id 3 : 16 -> 36
id 29 : 31 -> 36

You then have, for each id, the range of differences. I guess that with a little adaptation you can get to what to you want.

EDIT : here is the final answer with the same kind of block:

>>> def compare(f1, f2):
    # 2 embedded 'with' because I'm on Python 2.5 :-)
    with open(f1+'.txt', 'r') as fin1:
        with open(f2+'.txt', 'r') as fin2:
            lines1 = fin1.readlines()
            lines2 = fin2.readlines()
                    # Do not forget the strip function to remove unnecessary '\n'
            diff_lines = [l.strip() for l in lines1 if l not in lines2]
                    # Ok, we have our differences (very basic)
            diffs = defaultdict(list)
            for line in diff_lines:
                d = eval(line) # We have a dict
                for k in d:
                    list_ids = d[k] # Only one value, k is the frame
                    for i in range(0, len(d[k]), 2):
                        diffs[d[k][i]].append(k)
            for id_ in diffs:
                diffs[id_].sort()
                for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
                    group = map(itemgetter(1), g)
                    print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])

>>> compare(r'E:\CFM\Dev\Python\test\f1', r'E:\CFM\Dev\Python\test\f2')
id 2 : 17 -> 24
id 2 : 26 -> 26
id 3 : 16 -> 24
id 3 : 26 -> 26
id 10 : 23 -> 24
id 10 : 26 -> 26
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top