Question

I'm using the following code for splitting up the dataset into a train and test data to save in a file;

import numpy as np
from sklearn.cross_validation import train_test_split

a = (np.genfromtxt(open('dataset.csv','r'), delimiter=',', dtype='int')[1:])
a_train, a_test = train_test_split(a, test_size=0.33, random_state=0)

c1 = open('trainfile.csv', 'w')
arr1 = str(a_train)
c1.write(arr1)
c1.close

c2 = open('testfile.csv', 'w')
arr2 = str(a_test)
c2.write(arr2)
c2.close

However I get the following output in the file;

trainfile.csv:
[[ 675847       0       0 ...,       0       0       3]
 [  74937       0       0 ...,       0       0       3]
 [  65212       0       0 ...,       0       0       3]
 ..., 
 [  18251       0       0 ...,       0       0       1]
 [1131828       0       0 ...,       0       0       1]
 [  14529       0       0 ...,       0       0       1]]

That is the entire content of trainfile. I'm facing the same issue with the output for testfile.csv as well. What I want is the entire training and test data to be stored inside the file instead of periods denoting extra data. Suggestions?

Was it helpful?

Solution

This is because you are calling the string method str on the numpy array. Use the numpy function numpy.savetxt instead. It would look something like

with open('testfile.csv', 'w') as FOUT:
    np.savetxt(FOUT, a_test)

Note that the format would not necessarily be readable by a CSV reader. If that is your intention, you can use https://docs.python.org/2/library/csv.html.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top