Question

I have a file with lines of data. Each line starts with an id, followed by fixed set of attributes separated by comma.

123,2,kent,...,
123,2,bob,...,
123,2,sarah,...,
123,8,may,...,

154,4,sheila,...,
154,4,jeff,...,

175,3,bob,...,

249,2,jack,...,
249,5,bob,...,
249,3,rose,...,

I would like to get an attribute if the conditions are met. The conditions are if 'bob' appears within the same id, get the value of the second attribute that follows.

For example:

id: 123
values returned: 2, 8

id: 249
values returned: 3

Java has a double loop I can use, but I would like to try this in Python. Any suggestions would be great.

Was it helpful?

Solution

I came up with a (perhaps) more pythonic solution which uses groupby and dropwhile. This method yields the same result as the below method, but I think it's prettier.. :) Flags, "curr_id" and stuff like that is not very pythonic, and should be avoided if possible!

import csv
from itertools import groupby, dropwhile

goal = 'bob'
ids = {}

with open('my_data.csv') as ifile:
    reader = csv.reader(ifile)
    for key, rows in groupby(reader, key=lambda r: r[0]):
        matched_rows = list(dropwhile(lambda r: r[2] != goal, rows))
        if len(matched_rows) > 1:
            ids[key] = [row[1] for row in matched_rows[1:]]

print ids

(first solution below)

from collections import defaultdict
import csv

curr_id = None
found = False
goal = 'bob'
ids = defaultdict(list)

with open('my_data.csv') as ifile:
    for row in csv.reader(ifile):
        if row[0] != curr_id:
            found = False
            curr_id = row[0]
        if found:
            ids[curr_id].append(row[1])
        elif row[2] == goal:
            found = True

print dict(ids)

Output:

{'123': ['2', '8'], '249': ['3']}

OTHER TIPS

Just set a flag or something as you loop through:

name = 'bob'
id = '123'
found = False

for line in file:
    l = line.split(',')
    if l[0] == id:
        if l[2] == name:
            found = True
        if found:
            print l[1]
import csv, collections as co, cStringIO as StringIO

s = '''123,2,kent,...,
123,2,bob,...,
123,2,sarah,...,
123,8,may,...,
154,4,sheila,...,
154,4,jeff,...,
175,3,bob,...,
249,2,jack,...,
249,5,bob,...,
249,3,rose,...,'''

filelikeobject = StringIO.StringIO(s)
dd = co.defaultdict(list)
cr = csv.reader(filelikeobject)
for line in cr:
  if line[2] == 'bob':
    dd[line[0]]; continue
  if line[0] in dd:
    dd[line[0]].append(line[1])

Result:

>>> dd
defaultdict(<type 'list'>, {'175': [], '123': ['2', '8'], '249': ['3']})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top