Question

I want to predict the change in hit counts for a website using the rate of change. For example, if a site's hits double in a day (from 5,000 to 10,000), the rate of change is 2. If the hits go up by a half (5,000 to 7,500), then the rate of change is .5. Samples and rates of change will be calculated daily. I am fine with using raw hits instead of the rate of change, but that seems more difficult to work with.

I want to use scikit-learn to predict future rates of change. Given a set of past data points, how would I predict the future changes? Should I use a logistic regression? A support vector machine? Something else?

Thanks for your help! I'm new to scikit-learn, so feel free to comment if you need any more information about the problem.

Was it helpful?

Solution

Edit: I forgot that you were looking for sklearn solution, but I think that simple weighted moving average could be good start. I usually try to start with something simple and only if that does not give me desired results go to more complicated stuff.

There are fancier approaches, but a simple one is to use weighted moving average, where you give more weight to most recent observations. For example:

import numpy as np

hits = np.array([100, 500, 300, 800, 900])

def predict(hits, weights):
    return np.average(hits[-len(weights) :], weights=weights)

Result:

>>> predict(hits, [0.2, 0.3, 0.5])
750.0
>>> 900 * 0.5 + 800 * 0.3 + 300 * 0.2
750.0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top