Question

Objective: Pass R a single vector of street addresses and have a three-vector dataframe returned where the first vector is the street address ("Street.Address"), the second vector is the latitude ("Lat"), and the third vector is the longitude ("Lng"). For simplicity, I'm only using four addresses; that is, the length of the vector is 4.

Approach: I'm using Jitender Aswani's code to create a geocode function using Google Maps' API. The function works brilliantly, and I'm able to find the lat/long of any address I choose. The code:

getGeoCode <- function(address)
{ 
  #Load library
  library("RJSONIO")

  #Encode URL parameters
  address <- gsub(' ','%20',address)

  #Open connection
  connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',address, sep="") 
  con <- url(connectStr)
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con)

  #Flatten the received JSON
  data.json <- unlist(data.json)
  lat <- data.json["results.geometry.location.lat"]
  lng <- data.json["results.geometry.location.lng"]
  gcodes <- c(lat, lng)
  names(gcodes) <- c("Lat", "Lng")
  return (gcodes)
}

geocodes<-getGeoCodes("Palo Alto, California")
geocodes

        Lat            Lng 
"37.4418834" "-122.1430195" 

My difficulty comes when trying to call the function in subsequent code. Let's call the original one column object "data.object." When I use the following code supplied by Aswani...

data.object <- with(data.object, data.frame(Street.Address, lapply(Street.Address, function(val){getGeoCode(val)})))

...I expect the function to return a three-column dataframe of length four, with column1 being the street address, column2 being the latitude, and column3 being the longitude:

    Street.Address                                  Lat            Lng
[1] 3625 1ST AVE S SEATTLE WA 98134           47.571010    -122.334447
[2] 2119 RAINIER AVE S SEATTLE WA 98144       47.584136    -122.302744
[3] 9660 16TH AVE SW SEATTLE WA 98106         47.516180    -122.355138
[4] 8300 RAINIER AVE S SEATTLE WA 98118       47.529750    -122.270010

Instead, I'm getting a five-column dataframe where the values in the second column alternate between the first address' latitude and the first address' longitude, the values in the third column alternate between the second address' latitude and the second address' longitude, and so on:

    Street.Address                           column2        column3      column4    column5
[1] 3625 1ST AVE S SEATTLE WA 98134        47.571010      47.584136    47.516180    47.529750
[2] 2119 RAINIER AVE S SEATTLE WA 98144  -122.334447    -122.302744  -122.355138  -122.270010
[3] 9660 16TH AVE SW SEATTLE WA 98106      47.571010      47.584136    47.516180    47.529750
[4] 8300 RAINIER AVE S SEATTLE WA 98118  -122.334447    -122.302744  -122.355138  -122.270010

I've tried rewriting the command using different combinations of the with(), within(), apply(), and lapply() functions, and I can't R to return a simple three-column dataframe. I know I'm overlooking something obvious, but I can't seem to figure it out.

Was it helpful?

Solution

Lapply returns a list and sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate. You can use sapply() then with t():

data.object <- with(data.object, data.frame(Street.Address, t(sapply(Street.Address, function(val){getGeoCode(val)}))))

OTHER TIPS

There's a really great post explaining the differences between the lapply family of functions. R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate. Considering your case it seems the problem is that you want lapply to return rows of a dataframe but its returning list. You can use sapply but that returns vectors and not rows. Best you can do is use sapply and convert the vector into the matrix of your desired dimensions, or unlist lapply and do the same. Lets try the first option.

addressmat=matrix(sapply(address, function(val){append(val,as.numeric(getGeoCode(val)))}),4,3, byrow=TRUE) 
addressmat
[,1]                                  [,2]         [,3]          
[1,] "3625 1ST AVE S SEATTLE WA 98134"     "47.5698918" "-122.3360067"
[2,] "2119 RAINIER AVE S SEATTLE WA 98144" "47.583897"  "-122.30269"  
[3,] "9660 16TH AVE SW SEATTLE WA 98106"   "47.5159917" "-122.3551272"
[4,] "8300 RAINIER AVE S SEATTLE WA 98118" "47.5295467" "-122.2699776"

This doesn't return the colnames but thats an easy fix.

colnames(addressmat) <- c("Street.Address","Lat","Lng")

Another option is Vectorize:

getGeoCodes <- Vectorize(getGeoCode)
x <- c(
  "3625 1ST AVE S SEATTLE WA 98134", 
  "2119 RAINIER AVE S SEATTLE WA 98144", 
  "9660 16TH AVE SW SEATTLE WA 98106"
)
locations <- getGeoCodes(x) # a matrix
result <- data.frame(
   StreetAdress=x,
   Lat=as.numeric(locations["Lat",]),
   Lng=as.numeric(locations["Lng",])
)
rownames(result) <- NULL
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top