Question

I searched SO, but I could not seem to find the right code that is applicable to my question. It is similar to this question: Linear Regression calculation several times in one dataframe

I got a dataframe of LR coefficients following Andrie's code:

Cddply <- ddply(test, .(sumtest), function(test)coef(lm(Area~Conc, data=test))) 

sumtest (Intercept) Conc
1   -108589.2726    846.0713372
2   -49653.18701    811.3982918
3   -102598.6252    832.6419926
4   -72607.4017 727.0765558
5   54224.28878 391.256075
6   -42357.45407    357.0845661
7   -34171.92228    367.3962888
8   -9332.569856    289.8631555
9   -7376.448899    335.7047756
10  -37704.92277    359.1457617

My question is how to apply each of these LR models (1-10) to specific row intervals in another dataframe in order to get x, the independent variable, into a 3rd column. For example, I would like to apply sumtest1 to Samples 6:29, sumtest2 to samples 35:50, sumtest3 to samples 56:79, etc.. in intervals of 24 and 16 samples. The sample numbers repeats after 200, so sumtest9 will be for Samples 6:29 again.

Sample  Area
6   236211
7   724919
8   1259814
9   1574722
10  268836
11  863818
12  1261768
13  1591845
14  220322
15  608396
16  980182
17  1415859
18  276276
19  724532
20  1130024
21  1147840
22  252051
23  544870
24  832512
25  899457
26  285093
27  4291007
28  825922
29  865491
35  246707
36  538092
37  767269
38  852410
39  269152
40  971471
41  1573989
42  1897208
43  261321
44  481486
45  598617
46  769240
47  229695
48  782691
49  1380597
50  1725419

The resulting dataframe would look like this:

Sample  Area    Calc
6   236211  407.5312917
7   724919  985.1525288
8   1259814 1617.363812
9   1574722 1989.564693
10  268836  446.0919309
...
35  246707  365.2452551
36  538092  724.3591324
37  767269  1006.805521
38  852410  1111.736505
39  269152  392.9073207

Thank you for your assistance.

Was it helpful?

Solution

Is this what you want? I made up a slightly larger dummy data set of 'area' to make it easier to see how the code worked when I tried it out.

# create 400 rows of area data
set.seed(123)
df <- data.frame(area = round(rnorm(400, mean = 1000000, sd = 100000)))

# "sample numbers repeats after 200" -> add a sample nr 1-200, 1-200
df$sample_nr <- 1:200

# create a factor which cuts the vector of sample_nr into pieces of length 16, 24, 16, 24...
# repeat to a total length of the pieces is 200 
# i.e. 5 repeats of (16, 24)
grp <- cut(df$sample_nr, breaks = c(-Inf, cumsum(rep(c(16, 24), 5))))

# add a numeric version of the chunks to data frame
# this number indicates the model from which coefficients will be used
# row 1-16 (16 rows): model 1; row 17-40 (24 rows): model 2;
# row 41-56 (16 rows): model 3; and so on. 
df$mod <- as.numeric(grp)

# read coefficients
coefs <- read.table(text = "intercept beta_conc
1   -108589.2726    846.0713372
2   -49653.18701    811.3982918
3   -102598.6252    832.6419926
4   -72607.4017 727.0765558
5   54224.28878 391.256075
6   -42357.45407    357.0845661
7   -34171.92228    367.3962888
8   -9332.569856    289.8631555
9   -7376.448899    335.7047756
10  -37704.92277    359.1457617", header = TRUE)

# add model number
coefs$mod <- rownames(coefs)

head(df)
head(coefs)

# join area data and coefficients by model number
# (use 'join' instead of merge to avoid sorting)
library(plyr)
df2 <- join(df, coefs)

# calculate conc from area and model coefficients
# area = intercept + beta_conc * conc
# conc = (area - intercept) / beta_conc
df2$conc <- (df2$area - df2$intercept) / df2$beta_conc
head(df2, 41)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top