Equivalent of substr for raw vectors

https://stackoverflow.com/questions/17100518

r
cran

31-05-2022
|

Question

Is there an equivalent of substring for raw vectors in R?

Say that I have a large binary raw vector x, e.g. as a result from reading a file using readBin. Now I used grepRaw to find the index of some fragment inside the raw vector that I would like to access. A toy example:

x <- charToRaw("foobar");
n <- 2;
m <- 5;

Now I would like to extract the "substring" from positions 2 and 5. A native way to do so is:

x[n:m]

However, this scales poorly for large fragments, because R first creates a large vector n:m and then iterates over this vector to extract the elements from x at these indices, one by one. Is there a more native method to extract a part of a raw vector, similar to substr for character vectors? I don't think I can use rawToChar because the files might contain non-text binary data.

Solution

This is a C implementation

library(inline)
subraw <- cfunction(c(x="raw", i="integer", j="integer"), "
    int n = INTEGER(j)[0] - INTEGER(i)[0] + 1;
    SEXP result;
    if (n < 0)
        Rf_error(\"j < i - 1\");
    result = Rf_allocVector(RAWSXP, n);
    memcpy(RAW(result), RAW(x) + INTEGER(i)[0] - 1L, n);
    return result;
")

with the usual caveats about missing sanity checks (e.g., i, j scalar and not NA, i > 0; j <= length(x), etc.). In action

> xx = readBin("~/bin/R-devel/lib/libR.so", raw(), 6000000)
> length(xx)
[1] 5706046
> length(subraw(xx, 1L, length(xx)))
[1] 5706046
> system.time(subraw(xx, 1L, length(xx)))
   user  system elapsed 
  0.000   0.000   0.001

subraw(xx, 10L, 9L) returns raw(0).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow