I wouldn't approach this task with regexes. It may work, but only in simple cases. Consider the following /tmp/test.R
script:
x <- 1 # a comment
y <- "#######"
z <- "# not a comment \" # not \"" # a # comment # here
f <- # a function
function(n) {
for (i in seq_len(n))
print(i)} #...
As you see, it is a little bit complicated to state where the comment really starts.
If you don't mind reformatting your code (well, you stated that you want the smallest code possible), try the following:
writeLines(as.character(parse("/tmp/test.R")), "/tmp/out.R")
which will give /tmp/out.R
with:
x <- 1
y <- "#######"
z <- "# not a comment \" # not \""
f <- function(n) {
for (i in seq_len(n)) print(i)
}
Alternatively, use a function from the formatR
package:
library(formatR)
tidy_source(source="/tmp/test.R", keep.comment=FALSE)
## x <- 1
## y <- "#######"
## z <- "# not a comment \" # not \""
## f <- function(n) {
## for (i in seq_len(n)) print(i)
## }
BTW, tidy_source
has a blank
argument, which might be of your interest. But I can't get it to work with formatR 0.10 + R 3.0.2...