Using Pandas to_numeric() in Azure Machine Learning Studio
Question
I am facing an issue that Azure Machine Learning Studio fails to find the to_numeric
method in pandas.
After reading a .csv in AMLS I try to process it in a python script. The line that is throwing me an error is:
dataframe1['Monthly Debt'] = pd.to_numeric(dataframe1['Monthly Debt'])
pd of course is pandas, dataframe1 is my working dataframe. The error thrown is:
AttributeError: 'module' object has no attribute 'to_numeric'
Of course everything works on my local python. Do you have any idea what AMLS might be talking about?
Solution
DataFrame.convert_objects
has been deprecated in favor of type-specific functionspd.to_datetime
,pd.to_timestamp
andpd.to_numeric
(new in 0.17.0) (GH11133).
So for Pandas versions < 0.17.0 you can and should use: df.convert_objects(convert_numeric=True)
Demo:
In [213]: x = pd.DataFrame({'a':['11', 'aaa', '0', np.nan, '123']})
In [214]: x
Out[214]:
a
0 11
1 aaa
2 0
3 NaN
4 123
In [215]: x.dtypes
Out[215]:
a object
dtype: object
In [216]: x = x.convert_objects(convert_numeric=True)
In [217]: x
Out[217]:
a
0 11.0
1 NaN
2 0.0
3 NaN
4 123.0
In [218]: x.dtypes
Out[218]:
a float64
dtype: object
OTHER TIPS
OK this is an issue with Azure Machine Learning Studio. I just confirmed this with one of their data scientists.
I was using the Anaconda 2.0/Python 2.7.7 python version. Here, for some unknown reason the error will appear. If you just use Anaconda 4.0/Python 2.7.11 it will work as intended.