Looking at the C implementation a little, the logical and math operations implement their loops differently. The logical operations do something like (in logic.c:327)
library(inline)
or1 <- cfunction(c(x="logical", y="logical"), "
int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
SEXP ans = PROTECT(allocVector(LGLSXP, n));
int x1, y1;
for (int i = 0; i < n; i++) {
x1 = LOGICAL(x)[i % nx];
y1 = LOGICAL(y)[i % ny];
if ((x1 != NA_LOGICAL && x1) || (y1 != NA_LOGICAL && y1))
LOGICAL(ans)[i] = 1;
else if (x1 == 0 && y1 == 0)
LOGICAL(ans)[i] = 0;
else
LOGICAL(ans)[i] = NA_LOGICAL;
}
UNPROTECT(1);
return ans;
")
where there are two modulo operators %
each iteration. In contrast the arithmetic operations (in Itermacros.h:54) do something like
or2 <- cfunction(c(x="logical", y="logical"), "
int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
SEXP ans = PROTECT(allocVector(LGLSXP, n));
int x1, y1, ix=0, iy=0;
for (int i = 0; i < n; i++) {
x1 = LOGICAL(x)[ix];
y1 = LOGICAL(x)[iy];
if (x1 == 0 || y1 == 0)
LOGICAL(ans)[i] = 0;
else if (x1 == NA_LOGICAL || y1 == NA_LOGICAL)
LOGICAL(ans)[i] = NA_LOGICAL;
else
LOGICAL(ans)[i] = 1;
if (++ix == nx) ix = 0;
if (++iy == ny) iy = 0;
}
UNPROTECT(1);
return ans;
")
performing two tests for identity. Here's a version that skips the test for NA
or3 <- cfunction(c(x="logical", y="logical"), "
int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
SEXP ans = PROTECT(allocVector(LGLSXP, n));
int x1, y1, ix=0, iy=0;
for (int i = 0; i < n; ++i) {
x1 = LOGICAL(x)[ix];
y1 = LOGICAL(y)[iy];
LOGICAL(ans)[i] = (x1 || y1);
if (++ix == nx) ix = 0;
if (++iy == ny) iy = 0;
}
UNPROTECT(1);
return ans;
")
and then a version that avoids the LOGICAL macro
or4 <- cfunction(c(x="logical", y="logical"), "
int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
SEXP ans = PROTECT(allocVector(LGLSXP, n));
int *xp = LOGICAL(x), *yp = LOGICAL(y), *ansp = LOGICAL(ans);
for (int i = 0, ix = 0, iy = 0; i < n; ++i)
{
*ansp++ = xp[ix] || yp[iy];
ix = (++ix == nx) ? 0 : ix;
iy = (++iy == ny) ? 0 : iy;
}
UNPROTECT(1);
return ans;
")
Here are some timings
microbenchmark(my.or(a, b), a|b, or1(a, b), or2(a, b), or3(a, b), or4(a, b))
Unit: milliseconds
expr min lq median uq max neval
my.or(a, b) 8.002435 8.100143 10.082254 11.56076 12.05393 100
a | b 23.194829 23.404483 23.860382 24.30020 24.96712 100
or1(a, b) 17.323696 17.659705 18.069139 18.42815 19.57483 100
or2(a, b) 13.040063 13.197042 13.692152 14.09390 14.59378 100
or3(a, b) 9.982705 10.037387 10.578464 10.96945 11.48969 100
or4(a, b) 5.544096 5.592754 6.106694 6.30091 6.94995 100
The difference between a|b
and or1
reflects things not implemented here, like attributes and dimensions and special handling of objects. From or1
to or2
reflects the cost of different ways of recycling; I was surprised that there were differences here. From or2
to or3
is the cost of NA-safety. It's a little hard to know whether the additional speed-up in or4
would be seen in a base R implementation -- in user C code LOGICAL()
is a macro, but in base R it's an inlined function call.
The code was compiled with -O2
flags and
> system("clang++ --version")
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
Target: x86_64-pc-linux-gnu
Thread model: posix
Times of my.or
were not particularly consistent between independent R sessions, sometimes taking quite a bit longer; I'm not sure why. The timings above were with R version 2.15.3 Patched (2013-03-13 r62579); current R-devel seemed about 10% faster.