質問

I have database with three columns, y,x1 and x2:

>>>y      x1       x2
0  0.25   -19.3   -25.1
1  0.24   -18.2   -26.7
2  0.81   -45.2   -31.4
...

I want to create more features based on the x columns. until now I have just created random functions and tries to check their correlation with the y, but my question is if there is any propeer way/ common functions in order to create thise new features. I have used PolynomialFeatures of scikit learn but as I understood is not common to do more than 3.

from sklearn.preprocessing import PolynomialFeatures
#split x y....

poly = PolynomialFeatures(3)
poly=pd.DataFrame(poly.fit_transform(X))

My end goal is to use those new columns in random forest algorithm (I have more columns than x1 and x2 but those two that are interesting for me and would like to investigate them and their relationshop more).

役に立ちましたか?

解決

You are going to have to do something - You can try combining them in different ways, multiply them together, divide them by each other, subtract one from another. Without the context around what these features actually relate to its difficult to say what would make sense. Ultimately to make a new derived feature you are going to have to combine them or transform them in some way.

ライセンス: CC-BY-SA帰属
所属していません datascience.stackexchange
scroll top