문제

I have a huge input file i.e,

con1    P1  140 602
con1    P2  140 602
con2    P5  642 732
con3    P8  17  348
con3    P9  17  348

I want to iterate inside each con, remove repetitive elements in line[2] and line[3] and print the result in a new .txt file so that my output file looks like this,(Note: my second column might be different for each con)

con1    P1  140 602
con2    P5  642 732
con3    P8  17  348

My attempted script(not sure how to finish)

from collections import defaultdict
start = defaultdict(int)
end = defaultdict(int)
o1=open('result1.txt','w')
o2=open('result2.txt','w')
with open('example.txt') as f:
    for line in f:
        line = line.split()
        start[line[2]]
        end[line[3]]
        if start.keys() == 1 and end.keys() ==1:
            o1.writelines(line)
        else:
            o2.write(line)

UPDATE: Additional example

con20   EMT20540    951 1580
con20   EMT14935    975 1655
con20   EMT24081    975 1655
con20   EMT19916    975 1652
con20   EMT23831    975 1655
con20   EMT19915    975 1652
con20   EMT09010    975 1649
con20   EMT29525    975 1655
con20   EMT19914    975 1652
con20   EMT19913    975 1652
con20   EMT23832    975 1652
con20   EMT09009    975 1637
con20   EMT16812    975 1649

Expected output,

con20   EMT20540    951 1580
con20   EMT14935    975 1655
con20   EMT19916    975 1652
con20   EMT09010    975 1649
con20   EMT09009    975 1637
도움이 되었습니까?

해결책

You can use itertools.groupby here:

from itertools import groupby

with open('input.txt') as f1, open('f_out', 'w') as f2:
    #Firstly group the data by the first column
    for k, g in groupby(f1, key=lambda x:x.split()[0]):
        # Now during the iteration over each group, we need to store only
        # those lines that have unique 3rd and 4th column. For that we can
        # use a `set()`, we store all the seen columns in the set as tuples and
        # ignore the repeated columns.   

        seen = set()
        for line in g:
            columns = tuple(line.rsplit(None, 2)[-2:])
            if columns not in seen:
                #The 3rd and 4th column were unique here, so
                # store this as seen column and also write it to the file.
                seen.add(columns)
                f2.write(line.rstrip() + '\n') 
                print line.rstrip()

Output:

con20   EMT20540    951 1580
con20   EMT14935    975 1655
con20   EMT19916    975 1652
con20   EMT09010    975 1649
con20   EMT09009    975 1637

다른 팁

I said:

f = open('example.txt','r').readlines()
array = []

for line in f:
  array.append(line.rstrip().split())


def func(array, j):
  offset = []
  if j < len(array):
    firstRow = array[j-1]
    for i in range(j, len(array)):
      if (firstRow[3] == array[i][3] and firstRow[2] == array[i][2]
        and firstRow[0] == array[i][0]):
        offset.append(i)

    for item in offset[::-1]:# Q. Why offset[::-1] and not offset?
      del array[item]

    return func(array, j=j+1)

func(array, 1)

for e in array:
  print '%s\t\t%s\t\t%s\t%s' % (e[0],e[1],e[2],e[3])

The box said:

con20   EMT20540    951 1580
con20   EMT14935    975 1655
con20   EMT19916    975 1652
con20   EMT09010    975 1649
con20   EMT09009    975 1637

You can simply do it as follows:

my_list = list(set(open(file_name, 'r')))

and then write that to your other file

Simple example

>>> a = [1,2,3,4,3,2,3,2]
>>> my_list = list(set(a))

>>> print my_list
[1, 2, 3, 4]
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top