pandas create data frame, floats are objects, how to convert?

https://stackoverflow.com/questions/23591267

20-07-2023
|

Question

I have a text file:

sample    value1    value2
A    0.1212    0.2354
B    0.23493    1.3442

i import it:

with open('file.txt', 'r') as fo:
    notes = next(fo)
    headers,*raw_data = [row.strip('\r\n').split('\t') for row in fo] # get column headers and data
    names = [row[0] for row in raw_data] # extract first row (variables)
    data= np.array([row[1:] for row in raw_data],dtype=float) # get rid of first row

if i then convert it:

s = pd.DataFrame(data,index=names,columns=headers[1:])

the data is recognized as floats. I could get the sample names back as column by s=s.reset_index().

if i do

s = pd.DataFrame(raw_data,columns=headers)

the floats are objects and i cannot perform standard calculations.

How would you make the data frame ? Is it better to import the data as dict ?

BTW i am using python 3.3

Solution

You can parse your data file directly into data frame as follows:

df = pd.read_csv('file.txt', sep='\t', index_col='sample')

Which will give you:

         value1  value2
sample                 
A       0.12120  0.2354
B       0.23493  1.3442

[2 rows x 2 columns]

Then, you can do your computations.

OTHER TIPS

To parse such a file, one should use pandas read_csv function.

Below is a minimal example showing the use of read_csv with parameter delim_whitespace set to True

import pandas as pd
from StringIO import StringIO  # Python2 or
from io import StringIO  # Python3

data = \
"""sample    value1    value2
A    0.1212    0.2354
B    0.23493    1.3442"""

# Creation of the dataframe
df = pd.read_csv(StringIO(data), delim_whitespace=True)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow