Frage

I would like to create a numeric indicator for a matrix such that for each unique element in one variable, it creates a sequence of the length based on the element in another variable. For example:

frame<- data.frame(x = c("a", "a", "a", "b", "b"), y = c(3,3,3,2,2))
frame
  x y
1 a 3
2 a 3
3 a 3
4 b 2
5 b 2

The indicator, z, should look like this:

  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Any and all help greatly appreciated. Thanks.

War es hilfreich?

Lösung

No ave?

frame$z <- with(frame, ave(y,x,FUN=seq_along) )
frame

#  x y z
#1 a 3 1
#2 a 3 2
#3 a 3 3
#4 b 2 1
#5 b 2 2

A data.table version could be something like below (thanks to @mnel):

#library(data.table)
#frame <- as.data.table(frame)
frame[,z := seq_len(.N), by=x]

My original thought was to use:

frame[,z := .SD[,.I], by=x]

where .SD refers to each subset of the data.table split by x. .I returns the row numbers for an entire data.table. So, .SD[,.I] returns the row numbers within each group. Although, as @mnel points out, this is inefficient compared to the other method as the entire .SD needs to be loaded into memory for each group to run this calculation.

Andere Tipps

Another approach:

frame$z <- unlist(lapply(rle(as.numeric(frame[, "x"]))$lengths, seq_len))
library(dplyr)
frame %.%
  group_by(x) %.%
  mutate(z = seq_along(y))

You can split the data.frame on x, and generate a new id column based on that:

> frame$z <- unlist(lapply(split(frame, frame$x), function(x) 1:nrow(x)))
> frame
  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Or even more simply using data.table:

library(data.table)
frame <- data.table(frame)[,z:=1:nrow(.SD),by=x]

Try this where x is the column by which grouping is to be done and y is any numeric column. if there are no numeric columns use seq_along(x), say, in place of y:

transform(frame, z = ave(y, x, FUN = seq_along))
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top