Question

I'm attempting to turn .csv data into a dictionary in Python but I appear to be getting duplicate dictionary entries.

This is an example of what the .csv data looks like:

ticker,1,2,3,4,5,6
XOM,10,15,17,11,13,20
AAPL,12,11,12,13,11,22

My intention is to use the first column as the key and the remaining columns as the values. Ideally I should have 3 entries: ticker, XOM, and AAPL. But instead I get this:

{'ticker': ['1', '2', '3', '4', '5', '6']}
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'XOM': ['10', '15', '17', '11', '13', '20']}
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'XOM': ['10', '15', '17', '11', '13', '20']}
{'AAPL': ['12', '11', '12', '13', '11', '22']}

So it looks like I'm getting row 1, then row 1 & 2, then row 1, 2 & 3.

This is the code I'm using:

def data_pull():
    #gets data out of a .csv file
    datafile = open("C:\sample.csv")
    data = [] #blank list
    dict = {} #blank dictionary
    for row in datafile:
            data.append(row.strip().split(",")) #removes whitespace and commas
            for x in data: #organizes data from list into dictionary
                k = x[0]
                v = x[1:]
                dict = {k:v for x in data}
                print dict

data_pull()

I'm trying to figure out why the duplicate entries are showing up.

Was it helpful?

Solution

You have too many loops; you extend data then loop over the whole data list with all entries gathered so far:

for row in datafile:
    data.append(row.strip().split(",")) #removes whitespace and commas
    for x in data:
        # will loop over all entries parsed so far

so you'd append a row to data, then loop over the list, with one item:

data = [['ticker', '1', '2', '3', '4', '5', '6']]

then you'd read the next line and append to data, so then you loop over data again and process:

data = [
    ['ticker', '1', '2', '3', '4', '5', '6'],
    ['XOM', '10', '15', '17', '11', '13', '20'],
]

so iterate twice, then add the next line, loop three times, etc.

You could simplify this to:

for row in datafile:
    x = row.strip().split(",")
    dict[x[0]] = x[1:]

You can save yourself some work by using the csv module:

import csv

def data_pull():
    results = {} 

    with open("C:\sample.csv", 'rb') as datafile:
        reader = csv.reader(datafile)
        for row in reader:
            results[row[0]] = row[1:]

    return results

OTHER TIPS

Use the built in csv module:

import csv

output = {}

with open("C:\sample.csv") as f:
    freader = csv.reader(f)
    for row in freader:
        output[row[0]] = row[1:]

The loop for x in data should be outside of the loop for row in datafile:

for row in datafile:
    data.append(row.strip().split(",")) #removes whitespace and commas
for x in data: #organizes data from list into dictionary
    k = x[0]

Or, csv module can be your friend:

with open("text.csv") as lines:
    print {row[0]: row[1:] for row in csv.reader(lines)}

A side note. It's always a good idea to use the raw strings for Windows paths:

open(r"C:\sample.csv")

If your file was named, e.g, C:\text.csv then \t would be interpreted as a tab character.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top