Question

I am a relative newcomer to R and not a mathematician but a geneticist. I have many sets of multiple pairs of data points. When they are plotted they yield a flattened S curve with most of the data points ending up near the zero mark. A minority of the data points fly far off creating what is almost two J curves, one down and one up. I need to find the inflection points where the data sharply veers upward or downward. This may be an issue with my math but is seems to me that if I can smooth and fit a curve to the line and get an equation I could then take the second derivative of the curve and determine the inflection points from where the second derivative changes sign. I tried it in excel and used the curve to get approximate fit to get the starting formula but the data has a bit of "wiggling" in it so determining any one inflection point is not possible even if I wanted to do it all manually (which I don't). Each of the hundreds of data sets that I have to find these two inflection points in will yield about the same curve but will have slightly different inflection points and determining those inflections points precisely is absolutely critical to the problem. So if I can set it up properly once in an equation that should do it. For simplicity I would like to break them into the positive curve and the negative curve and do each one separately. (Maybe there is some easier formula for s curves that makes that a bad idea?)

I have tried reading the manual and it's kind of hard to understand likely because of my weak math skills. I have also been unable to find any similar examples I could study from.

This is the head of my data set: x y [1,] 1 0.00000000 [2,] 2 0.00062360 [3,] 3 0.00079720 [4,] 4 0.00085100 [5,] 5 0.00129020

(X is just numbering 1 to however many data points and the number of X will vary a bit by the individual set.)

This is as far as I have gotten to resolve the curve fitting part:

pos_curve1 <- nls(curve_fitting ~ (scal*x^scal),data = cbind.data.frame(curve_fitting), + start = list(x = 0, scal = -0.01)) Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf

Am I just doing the math the hard way? What am I doing wrong with the nls? Any help would be much much appreciated.

Was it helpful?

Solution

Found it. The curve is exponential not J and the following worked.

fit <- nls(pos ~ a*tmin^b, 
             data = d, 
             start = list(a = .1, b = .1), 
             trace = TRUE)

Thanks due to Jorge I Velez at R Help Oct 26, 2009

Also I used "An Appendix to An R Companion to Applied Regression, second edition" by John Fox & Sanford Weisberg last revision 13: Dec 2010.

Final working settings for me were:

fit <- nls(y ~ a*log(10)^(x*b),pos_curve2,list(a = .01, b = .01), trace=TRUE)

I figured out what the formula should be by using open office spread sheet and testing the various curve fit options until I was able to show exponential was the best fit. I then got the structure of the equation from that. I used the Fox & Sanford article to understand the way to set the parameters.

Maybe I am not alone in this one but I really found it hard to figure out the parameters and there were few references or questions on it that helped me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top