Question

I have data from the USGS Nation Water Data website. I am currently trying to plot and fit curves to the data to use in prediction for different measurements taken within the dataset (dissolved oxygen, pH, gage height and temp) all in relation to discharge rate. I used the "nls" command and I am using a book of equations to find which curve to use...for this example I have specifically used Schumacher’s equation (p.48 in the book).

Find the link to data:

curve book: http://www.for.gov.bc.ca/hfd/pubs/docs/bio/bio04.htm

data I used: http://waterdata.usgs.gov/mi/nwis/uv?referred_module=qw&search_station_nm=River%20Rouge%20at%20Detroit%20MI&search_station_nm_match_type=anywhere&index_pmcode_00065=1&index_pmcode_00060=1&index_pmcode_00300=1&index_pmcode_00400=1&index_pmcode_00095=1&index_pmcode_00010=1&group_key=NONE&sitefile_output_format=html_table&column_name=agency_cd&column_name=site_no&column_name=station_nm&range_selection=date_range&begin_date=2013-11-18&end_date=2013-12-18&format=html_table&date_format=YYYY-MM-DD&rdb_compression=file&list_of_search_criteria=search_station_nm,realtime_parameter_selection

My problem is that I CANNOT get nls to predict new values once I picked a curve coded it... I also can't quite figure out how to plot it...I'm guessing this can be with the residuals? In the code I have used "aggregate" to extract means of the listed measurements and the corresponding discharge rates, now I just need to get R to predict for me. I got as far as getting what I think are fitted values... but I'm not sure and I hit a wall with "?nls."

##Create new dataframes with means given date for each constituent
ph <- aggregate(Discharge~pH, data=River.Data, mean)

##pH models
pH <- ph$pH
disch <- ph$Discharge
phm <- nls(disch~exp(a+(b/pH)), data=ph, trace=T, start=list(a=-47.06 ,b=400.2))
newph<- data.frame(ph=c(3.0,4.0,5.0,6.0,7.0,8.0,9.0))
predict(phm, newdata=newph)
Was it helpful?

Solution

Seems like you've already got your answer(??), but:

ph    <- aggregate(Discharge~pH, data=River.Data, mean)
phm   <- nls(Discharge~exp(a+(b/pH)), data=ph, trace=T, start=list(a=-47.06 ,b=400.2))
newph <- data.frame(pH=seq(3,9,by=0.1))
Discharge.pred <- predict(phm, newdata=newph)

plot(ph$pH, ph$Discharge, xlim=c(3,9), ylim=c(0,1000))
par(new=t)
plot(newph$pH,Discharge.pred, xlab="", ylab="", axes=F, xlim=c(3,9), ylim=c(0,1000), type="l")

The problem is that your data is for pH in [7.5,8.2] but you are trying to predict in [3,9]. The model you've chosen is not stable for pH that far outside the range.

OTHER TIPS

Jarrod, try this. Cheers, Robert.

#Try this
#pH <- ph$pH # you don't need this
#disch <- ph$Discharge # you don't need this
phm <- nls(Discharge~exp(a+(b/pH)), data=ph, trace=T, start=list(a=-47.06 ,b=400.2))
newph<- data.frame(pH=seq(3,9,0.1)) # it'll be smoother with a sequence in increments of 0.1
plot(newph,predict(phm, data=newph,type="l"))

This answer is based on @Carl Witthoft's comment. I'm highlighting this in a separate answer, because I skimmed over the comments and didn't really take in what Carl was proposing. Hoping it will help others who also find themselves here.

From the ?predict.nls file, the new data needs to be a "named list or dataframe". Which is where I have fallen down in the past; the list or dataframe column needs to have a name. And as Carl states, this needs to be exactly the same name as that used in the model. I think the OP's issue may be because they used pH in the model, but ph in the creation of new data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top