Question

I have problems by merging two dataframes with different length.
To make it as easy as possible the datasets:

Dataset A - Persons http://pastebin.com/HbaeqACi
Dataset B - Waterfeatures: http://pastebin.com/UdDvNtHs
Dataset C - City: http://pastebin.com/nATnkMRk

I have some R-Code , which is not relevant for my problem, but I will paste it completely, so you have exactly the same situation:

require(fossil)
library(fossil)
#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("city.csv", header =TRUE)

#### calculate distance
# Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)

#Generate a function that looks for the closest waterfeature for each id coordinates and calculate/save the distance
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
disnw <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
disnw <- min(disnw)
disnw <- data.frame(disnw, WaterFeature=tmp)
return(disnw)
}

# apply distance calculation function to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))

Now I want to copy the calculated distance to the city dataset. I've tried to use merge (both datasets have the city attribute) and the persons only contains the cities from the city dataset.

city_all_parameters = city
city_all_parameters = merge(city_all_parameters, persons[,c("city", "disnw")], all=TRUE)

Unfortunately this is not the outcome, which I want to have. I have 164 rows, but I only want to have these 5 rows + the variable disnw and it's corresponding value.
I've tried it out with rbind as well, but there I get the error:
"Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match"

Any tip, how to solve this problem?

Was it helpful?

Solution

Your code works as you intended, but I wanted to show you a more elegant way to do it in base. I have commented the code:

library(fossil)
# If you want to use pastebin, you can make it easy to load in for us like this:
# But I recommend using dput(persons) and pasting the results in.
persons = read.csv("http://pastebin.com/raw.php?i=HbaeqACi", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("http://pastebin.com/raw.php?i=UdDvNtHs", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("http://pastebin.com/raw.php?i=nATnkMRk", header =TRUE)

# Use column names instead of column indices to clarify your code
UniqueCoordinates <- data.frame(unique(persons[,c('POINT_X','POINT_Y')]))
# I didn't understand why you wanted to format the Id,
# but you don't need the Id in this code
# UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)

# Instead of calculating the pairwise distance between all 
# the water points everytime, use deg.dist with mapply:
UniqueCoordinates$disnw <- mapply(function(x,y) min(deg.dist(long1=x,lat1=y,
                                                             long2=water$POINT_X,
                                                             lat2=water$POINT_Y)),
                                  UniqueCoordinates$POINT_X,
                                  UniqueCoordinates$POINT_Y)

persons <- merge(UniqueCoordinates,persons)
# I think this is what you wanted...
unique(persons[c('city','disnw')])

#       city     disnw
# 1   City E 6.4865635
# 20  City A 1.6604204
# 69  City B 0.9893909
# 113 City D 0.6001968
# 148 City C 0.5308953

# If you want to merge to the city
merge(persons,city,by='city')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top