modification of code to bulk process in r [closed]

https://stackoverflow.com/questions/7507974

25-01-2021
|

Question

After being recommended to use adehabitat to calculate volume of intersection I have stumbled into a slight (hopefully simple) problem. In this library I am using the kerneloverlap command because I need to calculate the volume of intersection. I was wondering if you could help me with some programming questions. i need to modify the script to make it "bulk" processing friendly. I know enough of R to get myself into trouble and to lose hair because I know certain things should be possible, but can't figure out how to get it to work.

The command is quite simple:

kerneloverlap(loc[,c("X","Y")], loc$year, lev = 90, grid=30, meth="VI", conditional=TRUE)

where it takes from the data file loc the x, y coordinates, by year, and calculates the volume of intersection with a grid cell size of 30 in a utilization distribution of 90.

The input file (see below for excerpt) is anid, X, Y, year, and seasons. For this example there is only 1 season(keep in mind I have 3 seasons). For this example I want to compare within 1 season between years for each individual volume of intersection. So the test data have 2 years and 1 seasons and 2 individuals. What I would like to be able to say is "the volume of Intersection for animal 1 during calving season between year 2003 and 2004 is 0.8 which indicates a high level of overlap and fidelity to a location".

I would also like to then compare between seasons. Such that the volume of intersection for animal 1 during its summer and wintering seasons in 2003 is 0.04 which indicates a low level of overlap and no fidelity to the location".

Some thing to keep in mind: Not all individuals are present each year or were alive for each season. Therefore some sort of droplevel might be necessary.

This is my R script thus far (it doesn't work). Notice that the output is not being joined well together either and I can't seem to get a compiled file. Id like it to tell me what year, individual or season it is comparing things with.

IDNames= levels(loc$anid)
Year = unique(loc$year)
for (i in 1:(length(IDNames))){
vi90 = kerneloverlap(loc[,c("X","Y")], loc$year, lev = 90, grid=30, meth="VI", conditional=TRUE) 
    }
colnames(vi)= c(paste(IDNames[i],Year[n], sep =""),paste(IDNames[i], Year[n], sep =""))
}
write.csv(vi,"VolInter_indiv.csv")


    structure(list(anid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("c_002", "c_104"), class = "factor"), X = c(276646.0514, 
276485.0397, 278102.4193, 278045.4716, 278993.8807, 274834.5677, 
278516.0218, 296741.8328, 299080.2451, 291874.5068, 168540.0024, 
168360.8211, 169538.2299, 164538.2592, 157321.7524, 148090.3478, 
140575.2442, 133369.7162, 134375.0805, 138763.5342, 232347.5137, 
231989.4609, 231793.1066, 234923.4012, 233374.4531, 232256.4667, 
233660.3445, 239317.3128, 246354.664, 145161.8922, 144148.7895, 
145154.7652, 145399.3515, 144581.4836, 143646.7295, 145055.3165, 
144613.1393, 145037.3035, 144701.2676), Y = c(2217588.648, 2216616.387, 
2219879.777, 2220818.804, 2216908.127, 2220423.322, 2216589.91, 
2234167.287, 2239351.696, 2232338.072, 2273737.333, 2273954.782, 
2269418.423, 2271308.607, 2264694.484, 2263710.512, 2254030.274, 
2253352.426, 2248644.946, 2262359.026, 2231404.821, 2229583.89, 
2231700.485, 2231598.882, 2237122.967, 2233302.185, 2240092.997, 
2237702.817, 2249213.958, 2261841.308, 2263064.156, 2262236.452, 
2264147.03, 2263214.877, 2263336.363, 2261417.946, 2256289.995, 
2256694.953, 2253352.576), year = c(2003L, 2003L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2004L, 2004L, 
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L), season = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "calving", class = "factor")), .Names = c("anid", 
"X", "Y", "year", "season"), class = "data.frame", row.names = c(NA, 
-39L))

Solution

Ok, I'll bite.

Your code had some typos (I hope) that make it un-runnable. Let's throw it out and start over. The function kerneloverlap returns a matrix of overlap values for each pair of items specified in its second parameter. In your first example you're comparing years.

Let's start by imagining what we'd do with the data on just one animal, and write a function that outputs the values we want just for that simple case:

kernMod <- function(x){
    #x is the data for a single animal
    rs <- kerneloverlap(x[,c("X","Y")],
                        x$year,lev = 90, 
                        grid = 30, 
                        meth = "VI", 
                        conditional = TRUE)
    #Assumes we're only comparing two years
    out <- data.frame(year = paste(colnames(rs),collapse="-"), val = rs[2,1])
    out
}

Now we can apply this to each animal separately:

kernMod(subset(loc,anid == 'c_002'))
       year val
1 2003-2004   0
> kernMod(subset(loc,anid == 'c_104'))
       year        val
1 2003-2004 0.06033966

or we can use ddply from the plyr package to apply it to each animal in turn:

ddply(loc,.(anid),.fun = kernMod)

  anid      year        val
1 c_002 2003-2004 0.00000000
2 c_104 2003-2004 0.06033966

To include multiple seasons, you would simply add that to the list of variables to split over in ddply (untested):

ddply(loc,.(anid,season),.fun = kernMod)

To compare between seasons within year, you'll need to modify kernMod to pass x$season as the second argument and then call something like (untested):

ddply(loc,.(anid,year),.fun = kernMod)

If your full data has multiple years in it, kernMod will require some more modification, since kerneloverlap returns an n x n matrix, where n is the number of years in your data. Perhaps something like this (untested)

kernMod <- function(x){
    #x is the data for a single animal
    rs <- kerneloverlap(x[,c("X","Y")],
                        x$year,lev = 90, 
                        grid = 30, 
                        meth = "VI", 
                        conditional = TRUE)
    rs[lower.tri(rs,diag = TRUE)] <- NA
    rs <- melt(rs)
    rs <- subset(rs, !is.na(value))
    out <- data.frame(year = paste(rs$X1,rs$X2,collapse="-"), val = rs$value)
    out
}

This approach should handle "missing" animals by only calculating values for which you have data.

Ok. I'd love to get 3rd-4th author for this, but I'd settle for an acknowledgement. ;)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow