Question

I have a text file called test.txt which contains data in this format

....
a|b|c|d|e
a1|b2|c3|d4|e5
a3|b5|c2|d1|e3
....

I want to get the values of each column into lists: something like this

list1=[a,a1,a3]
list2=[b,b2,b5]

I managed to get this done by doing this:

list1,list2,list3,list4,list5 = ([] for i in range(5))

for line in open('test.txt','r'):
    temp=line.split('|')
    list1.append(temp[0])
    list2.append(temp[1])
    list3.append(temp[2])
    list4.append(temp[3])
    list5.append(temp[4].strip())

Is there shorter way to append the values to each list? I can only think of using 1 line for each list as above.

Was it helpful?

Solution

zip() is your friend here:

list1, list2, list3, list4, list5 = zip(
    *(line.strip().split('|') for line in open('test.txt')))

As an added bonus, you could also use this even if you didn't know how many columns there were - just assign it to a single variable instead, and you'd get a list, each item of which is the values for a column:

column_values = zip(*(line.strip().split('|') for line in open('test.txt')))
# column_values[0] is [a,a1,a3] ...

Let's step through this a little bit. First, we'll take a look at what happens with just the zip() bit:

list1, list2, list3, list4, list5 = zip(
    [0,1,2,3,4], [0,1,2,3,4], [0,1,2,3,4])

results in list1 = [0,0,0] and so on, because zip() takes the first element from each list and puts it in a list as the first element of the result.


Now, how do we get to zip(a,b,c) from a sequence [a,b,c]? Simple: we use the * positional argument expansion operator. zip(*L) is the same as zip(L[0], L[1], ...).


Finally, how do we get the list of lists we need to pass in? We use a generator expression:

(line.strip().split('|') for line in open('test.txt'))

creates a generator that yields a list of the items in each line, one line at a time (and strips whitespace off the items). This is exactly what we need to feed to zip() to get the result we want.

OTHER TIPS

You can use a list of lists:

table = [[] for i in range(5)]

with open('test.txt', 'r') as handle:
    for line in handle:
        for index, value in enumerate(line.strip().split('|')):
            table[index].append(value)

So instead of having list1, list2, etc., you just access the cells by table[0][0], table[2][1], etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top