Question

I am at the dimensionality reduction phase of my model. I have a list of categorical columns and I want to find the correlation between each column and my continuous SalePrice column. Below is the list of column names:

categorical_columns = ['MSSubClass', 'MSZoning', 'LotShape', 'LandContour', 'LotConfig', 'Neighborhood', 'Condition1',
                       'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd',
                       'Foundation', 'Heating', 'Electrical', 'Functional', 'GarageType', 'PavedDrive', 'Fence',
                       'MiscFeature', 'SaleType', 'SaleCondition', 'Street', 'CentralAir']

Because its categorical vs continuous, I've read that ANOVA is the best way to go but I have never used it before and couldn't find a concise implementation of it in Python. I want to loop through and output the correlation between each element in the list and the SalePrice column.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top