문제

Here's a MWE of my code:

import numpy as np

# Load data from file.
data = np.genfromtxt('data_input', dtype=None, unpack=True)

print data

Here's a sample of the data_input file:

01_500_aa_1000    990.0    990.0   112.5      0.2       72  0  0  1  0  0  0  0  0  0   0   0   0   1
02_500_aa_0950    990.0    990.0   112.5      0.2       77  0  0  1  0  0  0  0  0  0   0   0   0   1
03_500_aa_0600    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1
04_500_aa_0700    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1

The unpack argument does not appear to work since it always prints:

[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]

Can anybody reproduce this? What am I doing wrong?

도움이 되었습니까?

해결책 4

I'm posting my own answer since this is what I ended up using.

import numpy as np

# Load data from file.
data = np.genfromtxt('data_input', dtype=None)

# Force transpose list.
data = zip(*data)

This actually works and it's pretty easy to understand and use.

다른 팁

You're getting this because genfromtxt is returning a numpy record array, not a list. It's just that when you print() it to the console it looks like a list.

from cStringIO import StringIO
raw = """01_500_aa_1000    990.0    990.0   112.5      0.2       72  0  0  1  0  0  0  0  0  0   0   0   0   1
02_500_aa_0950    990.0    990.0   112.5      0.2       77  0  0  1  0  0  0  0  0  0   0   0   0   1
03_500_aa_0600    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1
04_500_aa_0700    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1"""
sio = StringIO(raw)
data = genfromtxt(sio, dtype=None, unpack=False)
print data
print
print data.dtype

gives:

[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]

[('f0', 'S14'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<i8'), ('f6', '<i8'), ('f7', '<i8'), ('f8', '<i8'), ('f9', '<i8'), ('f10', '<i8'), ('f11', '<i8'), ('f12', '<i8'), ('f13', '<i8'), ('f14', '<i8'), ('f15', '<i8'), ('f16', '<i8'), ('f17', '<i8'), ('f18', '<i8')]

unpack=True and unpack=False appear to return the same thing because you need a recarray. I would suggest you try pandas and forget about recarrays altogether. You can pass a recarray to the pandas.DataFrame and actually get s*** done! For example,

df = DataFrame(data)
print df
print
print df.f0

yields:

               f0         f1         f2         f3         f4  f5  f6  f7  f8  \
0  01_500_aa_1000     990.00     990.00     112.50       0.20  72   0   0   1   
1  02_500_aa_0950     990.00     990.00     112.50       0.20  77   0   0   1   
2  03_500_aa_0600     990.00     990.00     112.50       0.18  84   0   0   1   
3  04_500_aa_0700     990.00     990.00     112.50       0.18  84   0   0   1   

   f9  f10  f11  f12  f13  f14  f15  f16  f17  f18  
0   0    0    0    0    0    0    0    0    0    1  
1   0    0    0    0    0    0    0    0    0    1  
2   0    0    0    0    0    0    0    0    0    1  
3   0    0    0    0    0    0    0    0    0    1  

0    01_500_aa_1000
1    02_500_aa_0950
2    03_500_aa_0600
3    04_500_aa_0700
Name: f0, dtype: object

As mentioned by @Phillip Cloud, you are getting a recarray as you have a mix of data types (strings and numbers) - the strings in column 0 are causing this.

You could get around this by importing column 0 separately:

>>> np.genfromtxt('data_input', usecols=range(1,18))
array([[  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          2.00000000e-01,   7.20000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          2.00000000e-01,   7.70000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          1.80000000e-01,   8.40000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          1.80000000e-01,   8.40000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00]])
>>> np.genfromtxt('data_input', usecols=0,dtype=None)
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
   '04_500_aa_0700'], 
  dtype='|S14')

Or, you could reference the columns in the recarray like this:

>>> data['f0']
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
       '04_500_aa_0700'], 
      dtype='|S14')
>>> data['f5']
array([72, 77, 84, 84])

I can reproduce this. However, if your change dtype to float I get

[[             nan              nan              nan              nan]
 [  9.90000000e+02   9.90000000e+02   9.90000000e+02   9.90000000e+02]
 [  9.90000000e+02   9.90000000e+02   9.90000000e+02   9.90000000e+02]
 [  1.12500000e+02   1.12500000e+02   1.12500000e+02   1.12500000e+02]
 [  2.00000000e-01   2.00000000e-01   1.80000000e-01   1.80000000e-01]
 [  7.20000000e+01   7.70000000e+01   8.40000000e+01   8.40000000e+01]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 ...

I got the idea from this mailing list question.

Look at an answer given here. np.genfromtxt() returns data of the type ndarray. This cannot be heterogeneous.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top