سؤال

I have a Pandas dataframe with a single column of strings. I want to convert the column data to float. Some of the values cannot be converted to float due to their format. I want to omit these "illegal strings" from the result and only extract values that can be legally re-cast as floats. The starting data:

test=pd.DataFrame()
test.loc[0,'Value']='<3'
test.loc[1,'Value']='10'
test.loc[2,'Value']='Detected'
test.loc[3,'Value']=''

The desired output contains only strings that could be re-cast as floats (in this case, 10):

cleanDF=test['Value'].astype(float)
cleanDF
0    10
Name: Value, dtype: float64

Of course, this throws an error as expected on the illegal string for float conversion:

ValueError: could not convert string to float: <3

Is there a simple way to solve this if the dataframe is large and contains many illegal strings in 'Value'?

Thanks.

هل كانت مفيدة؟

المحلول

You could try using DataFrame's apply. Write a function that includes an exception handler and apply it to the DataFrame.

def test_apply(x):
    try:
        return float(x)
    except ValueError:
        return None

cleanDF = test['Value'].apply(test_apply).dropna()

نصائح أخرى

You can use errors=coerce with the apply method.

So first convert anything that can be converted to numeric, then drop NaN values, finally convert to float

cleanDF = test.apply(pd.to_numeric, errors = 'coerce').dropna().astype(float)

which returns only the values and the data type that you want

>>> cleanDF['Value']

cleanDF['Value']
1    10.0
Name: Value, dtype: float64

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top