Read Values from .csv file and convert them to float arrays

Question

Boy, have I got a treat for you. numpy.genfromtxt has a converters parameter, which allows you to specify a function for each column as the file is parsed. The function is fed the CSV string value. Its return value becomes the corresponding value in the numpy array.

Morever, the dtype = None parameter tells genfromtxt to make an intelligent guess as to the type of each column. In particular, numeric columns are automatically cast to an appropriate dtype.

For example, suppose your data file contains

2011-06-19 17:29:00.000,72,44,56

Then

import numpy as np
import datetime as DT

def make_date(datestr):
    return DT.datetime.strptime(datestr, '%Y-%m-%d %H:%M:%S.%f')

arr = np.genfromtxt(filename, delimiter = ',',
                    converters = {'Date':make_date},
                    names =  ('Date', 'Stock', 'Action', 'Amount'),
                    dtype = None)
print(arr)
print(arr.dtype)

yields

(datetime.datetime(2011, 6, 19, 17, 29), 72, 44, 56)
[('Date', '|O4'), ('Stock', '<i4'), ('Action', '<i4'), ('Amount', '<i4')]

Your real csv file has more columns, so you'd want to add more items to names, but otherwise, the example should still stand.

If you don't really care about the extra columns, you can assign a fluff-name like this:

arr = np.genfromtxt(filename, delimiter=',',
                    converters={'Date': make_date},
                    names=('Date', 'Stock', 'Action', 'Amount') +
                    tuple('col{i}'.format(i=i) for i in range(22)),
                    dtype = None)

yields

(datetime.datetime(2011, 6, 19, 17, 29), 72, 44, 56, 0.4772, 0.3286, 0.8497, 31.3587, 0.3235, 0.9147, 28.5751, 0.3872, 0.2803, 0, 0.2601, 0.2073, 0.1172, 0, 0.0, 0, 5.8922, 1, 0, 0, 0, 1.2759)

You might also be interested in checking out the pandas module which is built on top of numpy, and which takes parsing CSV to an even higher level of luxury: It has a pandas.read_csv function whose parse_dates = True parameter will automatically parse date strings (using dateutil).

Using pandas, your csv could be parsed with

df = pd.read_csv(filename, parse_dates = [0,1], header = None,
                    names=('Date', 'Stock', 'Action', 'Amount') +
                    tuple('col{i}'.format(i=i) for i in range(22)))

Note there is no need to specify the make_date function. Just to be clear --pands.read_csvreturns aDataFrame, not a numpy array. The DataFrame may actually be more useful for your purpose, but you should be aware it is a different object with a whole new world of methods to exploit and explore.