python pandas text block to data frame mixed types
-
21-12-2019 - |
Question
I am a python and pandas newbie. I have a text block that has data arranged in columns. The data in the first six columns are integers and the rest are floating point. I tried to create two DataFrames that I could then concatenate:
sect1 = DataFrame(dtype=int)
sect2 = DataFrame(dtype=float)
i = 0
# The first 26 lines are header text
for line in txt[26:]:
colmns = line.split()
sect1[i] = colmns[:6] # Columns with integers
sect2[i] = colmns[6:] # Columns with floating point
i +=
This causes an AssertionError: Length of values does not match length of index
Here are two lines of data
2013 11 15 0000 56611 0 1.36e+01 3.52e-01 7.89e-02 4.33e-02 3.42e-02 1.76e-02 2.89e+04 5.72e+02 -1.00e+05
2013 11 15 0005 56611 300 1.08e+01 5.50e-01 2.35e-01 4.27e-02 3.35e-02 1.70e-02 3.00e+04 5.50e+02 -1.00e+05
Thanks in advance for the help.
Solution
You can use Pandas csv parser along with StringIO. An example in pandas documentation.
For you sample that will be:
>>> import pandas as pd
>>> from StringIO import StringIO
>>> data = """2013 11 15 0000 56611 0 1.36e+01 3.52e-01 7.89e-02 4.33e-02 3.42e-02 1.76e-02 2.89e+04 5.72e+02 -1.00e+05
... 2013 11 15 0005 56611 300 1.08e+01 5.50e-01 2.35e-01 4.27e-02 3.35e-02 1.70e-02 3.00e+04 5.50e+02 -1.00e+05"""
Load data
>>> df = pd.read_csv(StringIO(data), sep=r'\s+', header=None)
Convert first three rows to datetime (optional)
>>> df[0] = df.iloc[:,:3].apply(lambda x:'{}.{}.{}'.format(*x), axis=1).apply(pd.to_datetime)
>>> del df[1]
>>> del df[2]
>>> df
0 3 4 5 6 7 8 9 10 \
0 2013-11-15 00:00:00 0 56611 0 13.6 0.352 0.0789 0.0433 0.0342
1 2013-11-15 00:00:00 5 56611 300 10.8 0.550 0.2350 0.0427 0.0335
11 12 13 14
0 0.0176 28900 572 -100000
1 0.0170 30000 550 -100000
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow