Efficient way to add-up all numbers except the ones that accompany an I using mclapply

StackOverflow https://stackoverflow.com/questions/19070574

  •  29-06-2022
  •  | 
  •  

Question

I have the following vector:

my.vector = c("4M1D5M15I1D10M", "3M", "4M2I3D")

And I'd like to transform it into the following vector:

my.result = c("21N", "3N", "7N")

The logic for such results is as follows, for "4M1D5M15I1D10M" I added all the numbers, except the ones that are preceding an "I" character, i.e., 4+1+5+1+10=21 (I did not add 15 because it precedes an "I"), and then paste an N right after 21, becoming "21N".

Same for "3M", there is no "I" character so it just becomes "3N"; and same for the last one, 4+3=7 (I did not add 2 because it precedes an "I"), becoming "7N".

Note that my.vector is extremely large so I want to use the parallel capabilities of the HPC server using mclapply. Ideally I'd run something like this to get my result:

my.result = unlist(mclapply(my.vector, my.adding.function, mc.cores = ncores))

For defining my function I tried the following:

my.adding.function <- function(x)
{
   tmp = unlist(strsplit(x, "\\d+I"))
   tmp2 = unlist(strsplit(tmp, "M|D|S|N"))
   tmp3 = sum(as.numeric(tmp2))
   return(paste(tmp3, "N",sep=""))
}

Not sure about the efficiency of such function though...

Was it helpful?

Solution

Here is one solution without mclapply, please check if it is feasible:

L <- regmatches(my.vector, gregexpr("(\\d+)(?=[A-HJ-Z])", my.vector, perl=TRUE))
sapply(L, function(x)paste0(sum(as.numeric(x)),"N"))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top