How to automate ANOVA in Python
Question
I am at the dimensionality reduction phase of my model. I have a list of categorical columns and I want to find the correlation between each column and my continuous SalePrice
column. Below is the list of column names:
categorical_columns = ['MSSubClass', 'MSZoning', 'LotShape', 'LandContour', 'LotConfig', 'Neighborhood', 'Condition1',
'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd',
'Foundation', 'Heating', 'Electrical', 'Functional', 'GarageType', 'PavedDrive', 'Fence',
'MiscFeature', 'SaleType', 'SaleCondition', 'Street', 'CentralAir']
Because its categorical vs continuous, I've read that ANOVA is the best way to go but I have never used it before and couldn't find a concise implementation of it in Python. I want to loop through and output the correlation between each element in the list and the SalePrice
column.
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange