Is this what you are looking for?
Edit after comment by @Simon:
eco$sp <- 0 #create new column `sp` initialized with 0
eco[eco$id %in% data$id[data$sp == 1],"sp"] <- 1 # replace 0 with 1 if for all id where data$sp == 1
Frage
I am a new user of R and I do not know very well how to improve the following script. I have heard about the apply functions but I did not manage to use them. Here is my problem:
I have two dataframes, the first one called data
and the second one called eco
. data
has more than 1 million rows and eco
90.000. They both have a common column named id
.For one id
, they are several rows in data
corresponding to the presence of botanic species.
I want to symplify this by giving a value to the id
in the data frame eco
if one specific specie is present or missing in the same id
in data
. The information will appear in a column sp
in eco
.
My script with the for loop, which takes hours to run:
for (k in (1:nrow(data))) {
if (data[k, "sp"]==1) #sp corresponds to one specific specie
{
eco[which(eco$id==data[k, "id"]), "sp"] = 1 # before this, the "sp" columnis empty in eco
}
}
How can I improve that ?
Thank you very much for any help.
Lösung 2
Is this what you are looking for?
Edit after comment by @Simon:
eco$sp <- 0 #create new column `sp` initialized with 0
eco[eco$id %in% data$id[data$sp == 1],"sp"] <- 1 # replace 0 with 1 if for all id where data$sp == 1
Andere Tipps
With 1,000,000 records I'd consider using data.table
. You can do this using one of data.table
's compound join operations, which is just data[sp==1,][eco]
, if you don't mind NA
being returned when species 1 is not present. You have the perfect setup. Two tables with a common key. You can easily do this like so:
# Some sample data
set.seed(123)
data <- data.frame( id = rep( letters[1:3] , each = 3 ) , sp = sample( 1:5 , 9 , TRUE ) )
eco <- data.frame( id = letters[1:3] , otherdat = rnorm(3) )
data
id sp
#1: a 2
#2: a 4
#3: a 3
#4: b 5
#5: b 5
#6: b 1 ===> species 1 is present at this id only
#7: c 3
#8: c 5
#9: c 3
eco
# id otherdat
#1: a -0.1089660
#2: b -0.1172420
#3: c 0.1830826
# All you need to do is turn your data.frames to data.tables, with a key, like so...
require(data.table)
data <- data.table( data , key = "id" )
eco <- data.table( eco , key = "id" )
# Join relevant records from data to eco by the common key
# This way keep 0 when species 1 is present and 0 otherwise
eco[ data[ , list( sp = as.integer( any( sp == 1 ) ) ) , by = id ] ]
# id otherdat sp
#1: a -0.1089660 0
#2: b -0.1172420 1
#3: c 0.1830826 0
# A more succinct way of doing this (and faster)
# is a compound join (but you get NA instead of 0)
data[sp==1,][eco]
# id sp otherdat
#1: a NA -0.1089660
#2: b TRUE -0.1172420
#3: c NA 0.1830826