Pergunta

G'day All,

I am working in R. Sorry about this really basic question, but I am a bit stuck. I have a data set of presence/absence point count data with date of count, and site number (see below). I would like to ultimately create a data.frame that collates all counts by grid cell number and has each visit to a site as a new visit (see below). I can't figure out how to do this, so thought I would take an easier route and make a column that gives a visit number for each record. So, the column would give a number for each record by the date of the visit within each site group (see below). I can't figure out how to do this either! Any help would be great, thank you in advance.

Kind regards, Adam

I have this:

Site    date
1   12/01/2000
1   24/02/2000
1   13/08/2001
2   14/01/2000
2   21/01/2002
3   1/01/1999
3   21/04/2000

Ultimately want this:

Site           vist1              v2                 v3
1              12/01/2000         24/02/2000         13/08/2001
2              14/01/2000         21/01/2002         na
3              01/01/1999         21/04/2000         na

But this would be good:

Site   date        visit
1      12/01/2000  1
1      24/02/2000  2
1      13/08/2001  3
2      14/01/2000  1
2      21/01/2002  2
3      01/01/1999  1
3      21/04/2000  2
Foi útil?

Solução

Basically, you are wanting to reshape your data from a long format to a wide format, with repeated observations from a Site all in a single line. The base R function reshape() was designed for just this task.

The only (slight) complication is that you first need to add a column (which I here call obsNum) that identifies which is the first, second, third etc. observation at a Site. By setting timevar = "obsNum", you can then let reshape() know into which column you want to put each of the values of date.

df <- read.table(text = "Site date
1 12/01/2000
1 24/02/2000
1 13/08/2001
2 14/01/2000
2 21/01/2002
3 1/01/1999
3 21/04/2000", header=T, stringsAsFactors=FALSE)

df$obsNum <- unlist(sapply(rle(df$Site)$lengths, seq))
reshape(df, idvar="Site", timevar="obsNum", direction="wide")

#   Site     date.1     date.2     date.3
# 1    1 12/01/2000 24/02/2000 13/08/2001
# 4    2 14/01/2000 21/01/2002       <NA>
# 6    3  1/01/1999 21/04/2000       <NA>

Outras dicas

Here is another solution with ddply and dcast.

library(reshape2)
# Convert the date column into actual dates
df$date <- as.Date(df$date, format="%d/%m/%Y")
# Ensure that the data.frame is sorted
df <- df[ order(df$site, df$date), ]

# Number the visits for each site
df$visit <- 1
d <- ddply(df, "Site", transform, visit=cumsum(visit))

# Convert to a wide format
# (Since dcast forgets the Date type, convert it to strings
# before and back to dates after.)
d$date <- as.character(d$date)
d <- dcast(d, Site ~ visit, value.var="date")
d[,-1] <- lapply(d[,-1], as.Date)
d

Here is another take on the solution using plyr and reshape2.

require(plyr); require(reshape2); require(lubridate)
df <- ddply(df, .(Site), transform, visit = rank(dmy(date)))
dcast(df, Site ~ visit, value.var = 'date')
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top