문제

I have database with three columns, y,x1 and x2:

>>>y      x1       x2
0  0.25   -19.3   -25.1
1  0.24   -18.2   -26.7
2  0.81   -45.2   -31.4
...

I want to create more features based on the x columns. until now I have just created random functions and tries to check their correlation with the y, but my question is if there is any propeer way/ common functions in order to create thise new features. I have used PolynomialFeatures of scikit learn but as I understood is not common to do more than 3.

from sklearn.preprocessing import PolynomialFeatures
#split x y....

poly = PolynomialFeatures(3)
poly=pd.DataFrame(poly.fit_transform(X))

My end goal is to use those new columns in random forest algorithm (I have more columns than x1 and x2 but those two that are interesting for me and would like to investigate them and their relationshop more).

도움이 되었습니까?

해결책

You are going to have to do something - You can try combining them in different ways, multiply them together, divide them by each other, subtract one from another. Without the context around what these features actually relate to its difficult to say what would make sense. Ultimately to make a new derived feature you are going to have to combine them or transform them in some way.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top