Question

I have two files of exactly the same size and same number of columns. I want to add the i^th column of the 1st file to the i^th column of the 2nd file. Is their a neat way for doing this with python?

file1
a a a a a a
a a a a a a
a a a a a a

file2
b b b b b b
b b b b b b
b b b b b b

I want:

(a+b) (a+b) (a+b) (a+b) (a+b) (a+b)
(a+b) (a+b) (a+b) (a+b) (a+b) (a+b)
(a+b) (a+b) (a+b) (a+b) (a+b) (a+b)

EDIT: The above is just a simplification of a more complicated problem of mine. Each file has thousands of rows and I have many files (~100) to perform this kind of operation on.

Was it helpful?

Solution

with open("file1") as f1:
    with open("file2") as f2:
        for line1, line2 in zip(f1, f2):
            items1 = line1.split()
            items2 = line2.split()
            sums = ["({}+{})".format(i1, i2) for i1, i2 in zip(items1, items2)]
            print(" ".join(sums))

OTHER TIPS

pandas DataFrame can be a good choice for such operation. It allows making operation on whole data frames(matrices) e.g df_one.add(df_two)

1 steep read data from files into data frames: http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.DataFrame.from_csv.html (example: http://www.econpy.org/tutorials/general/csv-pandas-dataframe)

2 add two data frames as shown in this SO answear: Adding two pandas dataframes

i think this will help you

with open("file1") as a, open("file2") as b:
    x = [[int(i) for i in u.split()] for u in a.readlines()]
    y = [[int(i) for i in v.split()] for v in b.readlines()]
    n = len(x)
    m = len(x[0])
    ans = ""
    for i in xrange(n):
        for j in xrange(m):
            ans += str(x[i][j]+y[i][j]) + " "
        print ans[:-1]
        ans = ""

You can use numpy.loadtxt():

a = np.loadtxt('a.txt', dtype=object)
b = np.loadtxt('b.txt', dtype=object)

which will accept the element-wise string concatenation that you want, and even more:

print('('+a+'+'+b+')')
#array([['(a+b)', '(a+b)', '(a+b)', '(a+b)'],
#       ['(a+b)', '(a+b)', '(a+b)', '(a+b)'],
#       ['(a+b)', '(a+b)', '(a+b)', '(a+b)']], dtype=object)

print(a+b)
#array([['ab', 'ab', 'ab', 'ab'],
#       ['ab', 'ab', 'ab', 'ab'],
#       ['ab', 'ab', 'ab', 'ab']], dtype=object)

 print(3*a)
#array([['aaa', 'aaa', 'aaa', 'aaa'],
#       ['aaa', 'aaa', 'aaa', 'aaa'],
#       ['aaa', 'aaa', 'aaa', 'aaa']], dtype=object)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top