Question

I have a list of documents each having a relevance score for a search query. I need older documents to have their relevance score dampened, to try to introduce their date in the ranking process. I already tried fiddling with functions such as 1/(1+date_difference), but the reciprocal function is too discriminating for close recent dates.

I was thinking maybe a mathematical function with range (0..1) and domain(0..x) to amplify their score, where the x-axis is the age of a document. It's best to explain what I further need from the function by an image:

Was it helpful?

Solution

Decaying behavior is often modeled well by an exponentional function (many decaying processes in nature also follow it). You would use 2 positive parameters A and B and get

y(x) = A exp(-B x)

Since you want a y-range [0,1] set A=1. Larger B give slower decays.

OTHER TIPS

If a simple 1/(1+x) decreases too quickly too soon, a sigmoid function like 1/(1+e^-x) or the error function might be better suited to your purpose. Let the current date be somewhere in the negative numbers for such a function, and you can get a value that is current for some configurable time and then decreases towards a base value.

log((x+1)-age_of_document)

Where the base of the logarithm is (x+1). Note the x is as per your diagram and is the "threshold". If the age of the document is greater than x the score goes negative. Multiply by the maximum possible score to introduce scaling.

E.g. Domain = (0,10) with a maximum score of 10: 10*(log(11-x))/log(11)

A bit late, but as thiton says, you might want to use a sigmoid function instead, since it has a "floor" value for your long tail data points. E.g.:

0.8/(1+5^(x-3)) + 0.2 - You can adjust the constants 5 and 3 to control the slope of the curve. The 0.2 is where the floor will be.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top