Domanda

I have a few datapoints (x and y) that seem to have a logarithmic relationship.

> mydata
    x   y
1   0 123
2   2 116
3   4 113
4  15 100
5  48  87
6  75  84
7 122  77

> qplot(x, y, data=mydata, geom="line")

plot

Now I would like to find an underlying function that fits the graph and allows me to infer other datapoints (i.e. 3 or 82). I read about lm and nls but I'm not getting anywhere really.

At first, I created a function of which I thought it resembled the plot the most:

f <- function(x, a, b) {
    a * exp(b *-x)
}
x <- seq(0:100)
y <- f(seq(0:100), 1,1)
qplot(x,y, geom="line")

plot2

Afterwards, I tried to generate a fitting model using nls:

> fit <- nls(y ~ f(x, a, b), data=mydata, start=list(a=1, b=1))
   Error in numericDeriv(form[[3]], names(ind), env) :
   Missing value or an Infinity produced when evaluating the model

Can someone point me in the right direction on what to do from here?

Follow up

After reading your comments and googling around a bit further I adjusted the starting parameters for a, b and c and then suddenly the model converged.

fit <- nls(y~f(x,a,b,c), data=data.frame(mydata), start=list(a=1, b=30, c=-0.3))
x <- seq(0,120)
fitted.data <- data.frame(x=x, y=predict(fit, list(x=x))
ggplot(mydata, aes(x, y)) + geom_point(color="red", alpha=.5) + geom_line(alpha=.5) + geom_line(data=fitted.data)

plot3

È stato utile?

Soluzione

Maybe using a cubic specification for your model and estimating via lm would give you a good fit.

# Importing your data
dataset <- read.table(text='
    x   y
1   0 123
2   2 116
3   4 113
4  15 100
5  48  87
6  75  84
7 122  77', header=T)

# I think one possible specification would be a cubic linear model
y.hat <- predict(lm(y~x+I(x^2)+I(x^3), data=dataset)) # estimating the model and obtaining the fitted values from the model

qplot(x, y, data=dataset, geom="line") # your plot black lines
last_plot() + geom_line(aes(x=x, y=y.hat), col=2) # the fitted values red lines

# It fits good.

enter image description here

Altri suggerimenti

Try taking the log of your response variable and then using lm to fit a linear model:

fit <- lm(log(y) ~ x, data=mydata)

The adjusted R-squared is 0.8486, which at face value isn't bad. You can look at the fit using plot, for example:

plot(fit, which=2)

But perhaps, it's not such a good fit after all:

last_plot() + geom_line(aes(x=x, y=exp(fit$fitted.values)))

Check this document out: http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf

In brief, first you need to decide on the model to fit onto your data (e.g., exponential) and then estimate its parameters.

Here are some widely used distributions: http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top