Question

I've been working on code to create a parallel lapply() type function that uses Amazon's Elastic Map Reduce engine as the 'grid' for processing (yes, it's a mapper with no reducer). After I get the code stable I'll abstract it as a foreach backend. But first I need to build tests to test the code I have.

What would be some good test cases for this function?

My canonical test case right now is the following:

myList <- NULL
set.seed(1)
for (i in 1:10){
  a <- c(rnorm(999), NA)
  myList[[i]] <- a
}
outputLocal <- lapply(myList, mean, na.rm=T)
outputEmr   <- emrlapply(myList, mean, myCluster, na.rm=T)
all.equal(outputEmr, outputLocal) 

This test case makes sure the optional argument na.rm=T is passed properly to the remote machines. What are some other test cases that I could be using? I don't currently support simplify or USE.NAMES arguments, although I will in the future.

Was it helpful?

Solution

What happens if you pass emrlapply

  • A list of character vectors
  • An empty list
  • A list that is only empty after all the NA values have been removed
  • NULL
  • A vector (lapply works with vectors)
  • A matrix
  • A data.frame
  • A list of lists

You also need a test to see if your function gracefully handles EMR not being available or having required packages missing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top