Question

Suppose we have a vector:

x <- c(1,1,1,2,2,2,2,2,4,4,2,2,2,2)

What is a function that can take x and return l, where l is equal to

[[1]]
[1] 1 1 1
[[2]]
[1] 2 2 2 2 2
[[3]]
[1] 4 4
[[4]]
[1] 2 2 2 2
Was it helpful?

Solution

Use rle, rep, and split:

a <- rle(x)
split(x, rep(seq_along(a$lengths), a$lengths))
# $`1`
# [1] 1 1 1
#
# $`2`
# [1] 2 2 2 2 2
#
# $`3`
# [1] 4 4
# 
# $`4`
# [1] 2 2 2 2

In this, rle computes the "run lengths" of the input vector. The result is a list with lengths and values. We only need the lengths, from which we can create a "grouping" variable on which we can split the original vector.


Update: Benchmarks on a larger vector

I didn't benchmark the while loop because it was taking too long to complete with this long vector.

library(microbenchmark)
set.seed(1)
x <- sample(1:5, 1e5, replace = TRUE)
fun1 <- function() {
  a <- rle(x)
  split(x, rep(seq_along(a$lengths), a$lengths))
}
fun2 <- function() {
  splits = which(diff(x) != 0)
  split.locs = rbind(c(1, splits+1), c(splits, length(x)))
  apply(split.locs, 2, function(y) x[y[1]:y[2]])
}
fun3 <- function() split(x, c(0, cumsum(as.logical(diff(x)))))

microbenchmark(fun1(), fun2(), fun3(), times = 20)
# Unit: milliseconds
#    expr      min       lq   median       uq      max neval
#  fun1() 142.0386 147.7061 154.2853 158.0239 196.4665    20
#  fun2() 363.5707 386.0575 423.1791 444.4695 543.9427    20
#  fun3() 305.5331 316.0356 320.5203 329.7177 376.3236    20

OTHER TIPS

Another possibility:

split(x, c(0, cumsum(as.logical(diff(x)))))

Here's a different approach, which relies on diff and apply instead of a while loop:

x <- c(1,1,1,2,2,2,2,2,4,4,2,2,2,2)

# Indices of ends of continuous regions (diff helps us find where neighboring elements differ)
splits = which(diff(x) != 0)

# Columns are ranges of continuous regions
split.locs = rbind(c(1, splits+1), c(splits, length(x)))

# Split based on ranges
apply(split.locs, 2, function(y) x[y[1]:y[2]])
# [[1]]
# [1] 1 1 1
# [[2]]
# [1] 2 2 2 2 2
# [[3]]
# [1] 4 4
# [[4]]
# [1] 2 2 2 2

Here's a hack

    pure <- x
    out <- list()
    while(length(pure) > 0) {
        matches <- which(pure==pure[1])
        matches2 <- list()
        matches2[[1]] <- matches[1]
        for(i in 2:length(matches)) {
            if(matches[i] - matches[i-1] > 1) {
                break;
            }
            matches2[[i]] <- matches[i]
        }
        matches2 <- unlist(matches2)

        out[[length(out) + 1]] <- pure[matches2]
        pure <- pure[-matches2]
    }
            out
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top