Question

I am a python and pandas newbie. I have a text block that has data arranged in columns. The data in the first six columns are integers and the rest are floating point. I tried to create two DataFrames that I could then concatenate:

sect1 = DataFrame(dtype=int)
sect2 = DataFrame(dtype=float)
i = 0
# The first 26 lines are header text
for line in txt[26:]:
    colmns = line.split()
    sect1[i] = colmns[:6]  # Columns with integers
    sect2[i] = colmns[6:]  # Columns with floating point
    i +=

This causes an AssertionError: Length of values does not match length of index

Here are two lines of data

2013 11 15  0000   56611      0   1.36e+01  3.52e-01  7.89e-02  4.33e-02  3.42e-02  1.76e-02  2.89e+04  5.72e+02 -1.00e+05
2013 11 15  0005   56611    300   1.08e+01  5.50e-01  2.35e-01  4.27e-02  3.35e-02  1.70e-02  3.00e+04  5.50e+02 -1.00e+05

Thanks in advance for the help.

Was it helpful?

Solution

You can use Pandas csv parser along with StringIO. An example in pandas documentation.

For you sample that will be:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> data = """2013 11 15  0000   56611      0   1.36e+01  3.52e-01  7.89e-02  4.33e-02  3.42e-02  1.76e-02  2.89e+04  5.72e+02 -1.00e+05
... 2013 11 15  0005   56611    300   1.08e+01  5.50e-01  2.35e-01  4.27e-02  3.35e-02  1.70e-02  3.00e+04  5.50e+02 -1.00e+05"""

Load data

>>> df = pd.read_csv(StringIO(data), sep=r'\s+', header=None)

Convert first three rows to datetime (optional)

>>> df[0] = df.iloc[:,:3].apply(lambda x:'{}.{}.{}'.format(*x), axis=1).apply(pd.to_datetime)
>>> del df[1]
>>> del df[2]
>>> df
                   0   3      4    5     6      7       8       9       10  \
0 2013-11-15 00:00:00   0  56611    0  13.6  0.352  0.0789  0.0433  0.0342
1 2013-11-15 00:00:00   5  56611  300  10.8  0.550  0.2350  0.0427  0.0335

       11     12   13      14
0  0.0176  28900  572 -100000
1  0.0170  30000  550 -100000
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top