문제

I have a text file, data.txt, which contains:

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica

How do I load this data using numpy.loadtxt() so that I get a NumPy array after loading such as [['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]?

I tried

np.loadtxt(open("data.txt"), 'r',
           dtype={
               'names': (
                   'sepal length', 'sepal width', 'petal length',
                   'petal width', 'label'),
               'formats': (
                   np.float, np.float, np.float, np.float, np.str)},
           delimiter= ',', skiprows=0)
도움이 되었습니까?

해결책

If you use np.genfromtxt, you could specify dtype=None, which will tell genfromtxt to intelligently guess the dtype of each column. Most conveniently, it relieves you of the burder of specifying the number of bytes required for the string column. (Omitting the number of bytes, by specifying e.g. np.str, does not work.)

In [58]: np.genfromtxt('data.txt', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label'))
Out[58]: 
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),
       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),
       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),
       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),
       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], 
      dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])

If you do want to use np.loadtxt, then to fix your code with minimal changes, you could use:

np.loadtxt("data.txt",
   dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'),
          'formats': (np.float, np.float, np.float, np.float, '|S15')},
   delimiter=',', skiprows=0)

The main difference is simply changing np.str to |S15 (a 15-byte string).

Also note that open("data.txt"), 'r' should be open("data.txt", 'r'). But since np.loadtxt can accept a filename, you don't really need to use open at all.

다른 팁

It seems that keeping the numbers and text together has been causing you so much trouble - if you end up deciding to separate them, my workaround is:

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3])
labels = np.loadtxt('data', delimiter=',', usecols=[4])
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top