Question

I'm gonna read data which contains correlation matrix value and write tab-deliminated new file like this.

Input_data) (9 rows, 2 columns)

A_A   1
A_B   2
A_C   3
B_A   2
B_B   4
B_C   5
C_A   3
C_B   5
C_C   6

Output_data) (3 rows, 5 columns)

A  B  2  C  3
B  A  2  C  5
C  A  3  B  5

That is, in case of Output_data, the number of rows is 3 (from 3*3 matrix file, Input_data).
If taking a close look on the Output_data, first column contains the values of A_B, A_C (except A_A), for example. Actually, the real data which I'd like to parse contains about 200 rows and 2 columns.
What should I do to write this format following reading correlation matrix file?

Was it helpful?

Solution

try this:

dct = {}
with open('input', 'r') as f:
    for line in f:
        name, value = line.split()
        key, name = name.split('_')
        lst = dct.get(key ,[])
        lst.extend([name, value])
        dct[key] = lst


with open('result', 'w') as f:
    for k, v in dct.items():
        f.write(k+" " + (" ".join(v))+"\n")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top