Updating a dataframe with function and sapply
Вопрос
I am attempting to set a column in a dataframe equal to either 'US' or 'Foreign', depending on country. I believe the proper way to do so is to write a function, then use sapply
to actually update the dataframe. This is the first time I've attempted something like this in R
- in SQL
, I would have just written an UPDATE
query.
Here is my dataframe:
str(clients)
'data.frame': 252774 obs. of 4 variables:
$ ClientID : Factor w/ 252774 levels "58187855","59210128",..: 19 20 21 22 23 24 25 26 27 28 ...
$ Country : Factor w/ 207 levels "Afghanistan",..: 196 60 139 196 196 40 40 196 196 196 ...
$ CountryType : chr "" "" "" "" ...
$ OrderSize : num 12.95 21.99 5.00 7.50 44.5 ...
head(clients)
ClientID Country CountryType OrderSize
1 58187855 United States 12.95
2 59210128 France 21.99
3 65729284 Pakistan 5.00
4 25819711 United States 7.50
5 62837458 United States 44.55
6 88379852 China 99.28
The function I attempted to write is this:
updateCountry <- function(x) {
if (clients$Country == "US") {
clients$CountryType <- "US"
} else {
clients$CountryType <- "Foreign"
}
}
I would then apply it like so:
sapply(clients, updateCountry)
When I run sapply
against the head of the dataframe, I get this:
"US" "US" "US" "US" "US" "US"
Warning messages:
1: In if (clients$Country == "United States") { :
the condition has length > 1 and only the first element will be used
2: In if (clients$Country == "United States") { :
the condition has length > 1 and only the first element will be used
3: In if (clients$Country == "United States") { :
the condition has length > 1 and only the first element will be used
4: In if (clients$Country == "United States") { :
the condition has length > 1 and only the first element will be used
5: In if (clients$Country == "United States") { :
the condition has length > 1 and only the first element will be used
6: In if (clients$Country == "United States") { :
the condition has length > 1 and only the first element will be used
It appears that the function is classifying the Country correctly, but is not updating the clients$CountryType column correctly. What am I doing wrong? Also - is this the best way to accomplish updating the dataframe?
Решение
ifelse
seems like what you actually want here. It's a vectorized version of the if/else construct.
clients$CountryType <- ifelse(clients$Country == "US", "US", "Foreign")