문제

I want to use feature extraction in my program and then estimate the optimal weight of each feature and compute the score of new input record.

For example, I have a paraphrase dataset. Each record in this dataset is a pair of two sentences that the similarity of two sentences is indicated with a value between 0 and 1. After I extracted e.g. 4 features, I create new dataset with these feature values and similarity scores. I want to use this new dataset to learn the weights:

Paraphrase dataset:

"A problem was solved by a mathematician"; "A mathematician was found a solution for a problem"; 0.9  
.  
.   

New dataset:

0.42; 0.61; 0.21; 0.73; 0.9
.  
.

I want to use regression to estimate the weight of each feature. I want to compute the similarity of the input sentences in the program with equation 1: S = W1*F1 + W2*F2 + W3*F3 + W4*F4

I know the Regression algorithm could be used for this work but I don't know how? Please guide me to do this work? Is there any paper or document used the Regression algorithm?

도움이 되었습니까?

해결책

What you are looking for is a simple linear regression (which by the way is not an algorithm, but rather - data modeling approach, algorithms are used for finding the linear regression parameters, but regression itself is not an algorithm), yet you should also add the bias (intercept) term to your equation so it becomes:

S = w1*f1 + w2*f2 + w3*f3 + w4*f4 + b

or in the vectorized format

s = <F,W> + b

where <F,W> is inner product of your weights and features, and b is bias (real valued variable)

to unify, you can add a constant value f5=1, and include w5 instead of b, so it becomes

s = <F,W>

You can solve it using Ordinary Least Squares method

W = (F'F)^(-1)F's

which results in optimal linear regression in terms sum of squared residuals.

In each programming language you will find libraries for performing linear regression, so you do not have to implement it by yourself. In particular, libraries will also take care of introducing the b variable, so there is no need to implement it by yourself.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top