Question

Using the python 2.7 shell on osx lion. The .csv file has 12 columns by 892 rows.

import csv as csv
import numpy as np
# Open up csv file into a Python object
csv_file_object = csv.reader(open('/Users/scdavis6/Documents/Kaggle/train.csv', 'rb'))
header = csv_file_object.next()
data=[]
for row in csv_file_object:
    data.append(row)
    data = np.array(data)

# Convert to float for numerical calculations
number_passengers = np.size(data[0::,0].astype(np.float))

And this is the error I get:

Traceback (most recent call last):
  File "pyshell#5>", line 1, in <module>
    number_passengers = np.size(data[0::,0].astype(np.float))
TypeError: list indices must be integers, not tuple 

What am I doing wrong.

Was it helpful?

Solution

Don't use csv to read the data into a NumPy array. Use numpy.genfromtxt; using dtype=None will cause genfromtxt to make an intelligent guess at the dtypes for you. By doing it this way you won't have to manually convert strings to floats.

data[0::, 0] just gives you the first column of data. data[:, 0] would give you the same result.

The error message

TypeError: list indices must be integers, not tuple 

suggests that for some reason your data variable might be holding a list rather than a ndarray. For example, the same Exception can produced like this:

In [73]: data = [1,2,3]

In [74]: data[1,2]
TypeError: list indices must be integers, not tuple

I don't know why that is happening, but if you post a sample of your CSV we should be able to help fix that.

Using np.genfromtxt, your current code could be simplified to:

import numpy as np
filename = '/Users/scdavis6/Documents/Kaggle/train.csv'
data = np.genfromtxt(filename, delimiter=',', skiprows=1, dtype=None)
number_passengers = np.size(data, axis=0)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top