Getting same output as cut() using speedier hist() or findInterval()?

Question 1

Here is an implementation based on your findInterval suggestion which is 5-6 times faster than classical cut:

cut2 <- function(x, breaks) {
  labels <- paste0("(",  breaks[-length(breaks)], ",", breaks[-1L], "]")
  return(factor(labels[findInterval(x, breaks)], levels=labels))
}

library(microbenchmark)

set.seed(1)
data <- rnorm(1e4, mean=0, sd=1)

microbenchmark(cut.default(data, my_breaks), cut2(data, my_breaks))

# Unit: microseconds
#                         expr      min        lq    median        uq      max neval
# cut.default(data, my_breaks) 3011.932 3031.1705 3046.5245 3075.3085 4119.147   100
#        cut2(data, my_breaks)  453.761  459.8045  464.0755  469.4605 1462.020   100

identical(cut(data, my_breaks), cut2(data, my_breaks))
# TRUE

Question 2

The hist function creates counts by bins in a similar way to a combination of table and cut. For example,

set.seed(1)
x <- rnorm(100)

hist(x, plot = FALSE)
## $breaks
##  [1] -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0  2.5
## 
## $counts
##  [1]  1  3  7 14 21 20 19  9  4  2

table(cut(x, seq.int(-2.5, 2.5, 0.5)))
## (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5]  (-0.5,0]   (0,0.5]   (0.5,1]
##         1         3         7        14        21        20        19
##   (1,1.5]   (1.5,2]   (2,2.5] 
##         9         4         2

If you want the raw output from cut, you can't use hist.

However, if the speed of cut is a problem (and you might want to double check that it really is the slow part of your analysis; see premature optimization is the root of all evil), then you can use the lower level .bincode. This ignores the input checking and label-creating functions of cut.

.bincode(x, seq.int(-2.5, 2.5, 0.5))
## [1]  4  6  4  9  6  4  6  7  7  5  9  6 ...