Domanda

I want to split a street address into street name and street number in r.

My input data has a column that reads for example

    Street.Addresses

    205 Cape Road
    32 Albany Street 
    cnr Kempston/Durban Roads

I want to split the street number and street name into two separate columns, so that it reads:

    Street Number    Street Name
    205              Cape Road
    32               Albany Street
                     cnr Kempston/Durban Roads

Is it in anyway possible to split the numeric value from the non numeric entries in a factor/string in R?

Thank you

È stato utile?

Soluzione

you can try:

y <- lapply(strsplit(x, "(?<=\\d)\\b ", perl=T), function(x) if (length(x)<2) c("", x) else x)
y <- do.call(rbind, y)
colnames(y) <- c("Street Number", "Street Name")

hth

Altri suggerimenti

I'm sure that someone is going to come along with a cool regex solution with lookaheads and so on, but this might work for you:

X <- c("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads")
nonum <- grepl("^[^0-9]", X)
X[nonum] <- paste0(" \t", X[nonum])
X[!nonum] <- gsub("(^[0-9]+ )(.*)", "\\1\t\\2", X[!nonum])
read.delim(text = X, header = FALSE)
#    V1                        V2
# 1 205                 Cape Road
# 2  32             Albany Street
# 3  NA cnr Kempston/Durban Roads

Here is another way:

df <- data.frame (Street.Addresses = c ("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads"),
                 stringsAsFactors = F)

new_df <- data.frame ("Street.Number" = character(), 
                     "Street.Name" = character(), 
                     stringsAsFactors = F)
for (i in 1:nrow (df)) {

  new_df [i,"Street.Number"] <- unlist(strsplit (df[["Street.Addresses"]], " ")[i])[1]
  new_df [i,"Street.Name"] <- paste (unlist(strsplit (df[["Street.Addresses"]], " ")[i])[-1], collapse = " ")

}

> new_df
  Street.Number           Street.Name
1           205             Cape Road
2            32         Albany Street
3           cnr Kempston/Durban Roads
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top