문제

I have a text file that look like : (from ipython ) cat path_to_file

0   0.25    truth fact 
1   0.25    train home find travel
........
199 0.25    video box store office

I have another list

vec = [(76, 0.04334748761500331),
 (128, 0.03697806086341099),
 (81, 0.03131634819532892),
 (1, 0.03131634819532892)]

Now i want to only grab the matching first column from vec with first column of text file and show 1,2nd columns of vec with 3rd column from text file as my output.

If i had text file in same format as vec, i could have used set(a) & set(b). But values in test file are tabbed spaced(that's what it looks like when doing following)

with open( path_to_file ) as f: lines = f.read().splitlines()

Output is :

['0\t0.25\ttruth fact lie
.........................
 '198\t0.25\tfan genre bit enjoy ',
 '199\t0.25\tvideo box store office  ']
도움이 되었습니까?

해결책

Using NumPy:

import numpy as np
import numpy.lib.recfunctions as rfn

dtype = [('index', int), ('text', object)]
table = np.loadtxt(path_to_file, dtype=dtype, usecols=(0,2), delimiter='\t')

dtype = [('index', int), ('score', float)]
array = np.array(vec, dtype=dtype)

joined = rfn.join_by('index', table, array)

for row in joined:
      print row['index'], row['score'], row['text']

If you care a lot about performance you can use np.savetxt() to do the output too, but I thought it was easier to understand this way.

다른 팁

Converting vec to a dict and splitting the lines using "\t" as the delimiter should work:

vecdict = dict(vec)

output = []
for l in open('path_to_file'):
    words = l.split('\t')
    key = float(words[0])
    if vecdict.has_key(key):
        output.append("%s %f %s"%(words[0], vecdict[key], ' '.join(words[2:])) )

output should then be a list of strings.

PS: If you have multiple delimiters or are not sure which it is you could either use repeated calls to split, or re, e.g.

print re.findall("[\w]+", "this has    multiple delimiters\tHere")

>> ["this", "has", "multiple", "delimiters", "Here"]
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top