Question

I am writing a utility function to do some data format conversion, and I am having trouble stating it correctly, so that it applies to the data I want it to apply to, and returns a result of the right shape.

I have a test data set called HiRawTiny, the str demonstrated below. The data in V1 is char. I have a test function called GetRank, whose job is to take all chars to the right of a ":" and coerce them to numeric. This is also demonstrated below. The list of list syntax I used in the fn to get at the output of strsplit is a bit opaque to me, and I frankly I arrived at it by trial and error, but it appears to work ok when passed single values. But when I pass it a vector (a data frame column), it doesn't give me a vector result that's the same length as the vector I passed it, but only a single value.

What should I do to sort this out? I am new to R (though I used to use S many decades ago), and suspect I've got into a syntax muddle. Is my function syntax wrong given what I am trying to do? Should I be looking at using "apply" or one of its friends, to do this? Or should the fn be able to handle vector in/vector out natively?

str(HiRawTiny)

>'data.frame':  10 obs. of  7 variables:  
>$ V1: chr  "RANK:1" "RANK:2" "RANK:3" "RANK:4" ...  
$ V2: chr 
> "SOURCEID:CWC02001632398F4C" "SOURCEID:CWC020000F0D57DD6"
> "SOURCEID:CWC0200214C29872E" "SOURCEID:CWC0200163206B9F2" ...  
$ V3:
> chr  "TIME:01:04:2012-22:23:58" "TIME:01:04:2012-12:07:55"
> "TIME:01:04:2012-12:39:51" "TIME:02:04:2012-07:18:25" ...  
$ V4: chr 
> "SCORE:3142" "SCORE:3040" "SCORE:2911" "SCORE:2882" ...  
$ V5: chr 
> "TIEBREAK:4923864" "TIEBREAK:5787094" "TIEBREAK:766764"
> "TIEBREAK:1872936" ...  
$ V6: chr  "" "" "" "" ...  
$ V7: chr  "" ""
> "" "" ...

 GetRank function(x) {as.numeric(strsplit(x, split=":")[[1]][2]) }

GetRank(HiRawTiny[1,1]) [1] 1
GetRank(HiRawTiny[2,1]) [1] 2
GetRank(HiRawTiny[,1]) [1] 1

#"What I want is a vector of GetRank being applied to all of column 1
Was it helpful?

Solution

strsplit returs a list. Each element of the list contains the divided string. You can change the list into a matrix with do.call and rbind and then select the second column,

GetRank <- function(x) {as.numeric(do.call(rbind, strsplit(x, split=":"))[, 2]) }

GetRank(HiRawTiny$V1)

OTHER TIPS

Just another way (Using @Stephan's foo):

# split by strsplit, results in a list with the 2nd element of 
# each element of the list always being the number you want.
# so pick it up using sapply with "[[" and convert it to numeric
> as.numeric(sapply(strsplit(foo, ":"), "[[", 2))

You will need to unlist the result of strsplit and then extract those entries that are of interest to you.

foo <- paste("RANK:",1:10,sep="")
GetRank <- function(x) {
  as.numeric(unlist(strsplit(x,":"))[seq(2,2*length(x),by=2)])
}
GetRank(foo)

Try feeding your data to the function bit-by-bit and tracing what happens in each successive step.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top