Question

My input file called "locaddr" has the following records:

"Shelbourne Road, Dublin, Ireland"                                     
"1 Hatch Street Upper, Dublin, Ireland"                               
"98 Haddington Road, Dublin, Ireland"       
"11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland"
"Winterstraße 17, 69190 Walldorf, Germany"

I applied STRSPLIT function in R to this file using the following code:

*testmat <- strsplit(locaddr,split=",")
outmat <- matrix(unlist(testmat), nrow=nrow(locaddr), ncol=3, byrow=T)*

The final output I get is :

Street                        City                    Country          
 [1,] "Shelbourne Road"             " Dublin"               " Ireland"       
 [2,] "1 Hatch Street Upper"        " Dublin"               " Ireland"       
 [3,] "98 Haddington Road"          " Dublin"               " Ireland"       
 [4,] "11 Mount Argus Close"        " Harold's Cross"       " Dublin 6W"     
 [5,] " Co. Dublin"                 " Ireland"              "Winterstraße 17"
 [6,] " 69190 Walldorf"             " Germany"              "Caughley Road"  
 [7,] " Broseley"                   " Shropshire TF12 5AT"  " UK"            
 [8,] "Pappelweg 30"                " 48499 Salzbergen"     " Germany"       
 [9,] "60 Grand Canal Street Upper" " Dublin 4"             " Ireland"       
[10,] "Wieslocher Straße"           " 68789 Sankt Leon-Rot" " Germany"

As is obvious from above, the required output was the final three terms in each record. But instead I have a mix of nearly everything in there.

My requirement is though the addresses are all of variable length, after STRSPLIT, I need to pick only the last three terms and put them in as Street, City Country.

Your help and time are most appreciated.

Was it helpful?

Solution

Next time please provide your question with some handy reproducible code.

Following is the code of how I would try solving this problem.

x <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")

# split on ,
splitx <- strsplit(x, ",")

# for every list element (lapply climbs the list element-wise)
# subset last 3 elements
last3 <- lapply(splitx, tail, n = 3)

# merge them together by row
do.call("rbind", last3)

     [,1]                   [,2]              [,3]      
[1,] "Shelbourne Road"      " Dublin"         " Ireland"
[2,] "1 Hatch Street Upper" " Dublin"         " Ireland"
[3,] "98 Haddington Road"   " Dublin"         " Ireland"
[4,] " Dublin 6W"           " Co. Dublin"     " Ireland"
[5,] "Winterstraße 17"      " 69190 Walldorf" " Germany"

OTHER TIPS

This is basically a variant of Roman's answer, but meant to combine the (potentially) multiple addresses. It assumes that the last two comma-separated values are city and country, then pools the preceding elements.

# read data
y <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")
# split and output
result <- lapply(y, function(x) {
    splitx <- strsplit(x, ", ")[[1]]
    rowtail <- tail(splitx, n = 2)
    if(length(splitx)>3)
        multi <- paste(splitx[1:(length(splitx)-2)],collapse=", ")
    else
        multi <- splitx[1]
    return(c(multi,rowtail))
    })
# rbind back together
do.call(rbind,result)

This produces:

     [,1]                                              [,2]             [,3]     
[1,] "Shelbourne Road"                                 "Dublin"         "Ireland"
[2,] "1 Hatch Street Upper"                            "Dublin"         "Ireland"
[3,] "98 Haddington Road"                              "Dublin"         "Ireland"
[4,] "11 Mount Argus Close, Harold's Cross, Dublin 6W" "Co. Dublin"     "Ireland"
[5,] "Winterstraße 17"                                 "69190 Walldorf" "Germany"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top