Question

In my dataframe I have a column with the last names of parlement members in lowercase. I substituted the first letter with it's uppercase with (from this answer):

# vector with names
lastname <- c("wortmann-kool", "mulder", "nistelrooij", "camp", "schaake", "veld", "lange", "oomen-ruijten")
# substituting first letter with uppercase
lastname <- gsub("^(\\w)(\\w+)", "\\U\\1\\L\\2", lastname, perl = TRUE)

As you can see, some name have hyphen to separate the two names of married women. How do I substitute the first letter after the hyphen with it's uppercase?

Was it helpful?

Solution

Why not simply uppercase the first letter after a word boundary?

> lastname <- c("wortmann-kool", "mulder", "nistelrooij", "camp", "schaake", "veld", "lange", "oomen-ruijten")
> gsub("\\b(\\w)", "\\U\\1", lastname, perl = TRUE)
[1] "Wortmann-Kool" "Mulder"        "Nistelrooij"   "Camp"         
[5] "Schaake"       "Veld"          "Lange"         "Oomen-Ruijten"
> 

Quoting from the documentation:

For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion.

## capitalizing
txt <- "a test of capitalizing"
gsub("(\\w)(\\w*)", "\\U\\1\\L\\2", txt, perl=TRUE)
gsub("\\b(\\w)",    "\\U\\1",       txt, perl=TRUE)

OTHER TIPS

This will work for any punctuation characters you could potentially have there

gsub("(^|[[:punct:]])([[:alpha:]])", "\\1\\U\\2", lastname, perl=TRUE)

##[1] "Wortmann-Kool" "Mulder"        "Nistelrooij"   "Camp"         
##[5] "Schaake"       "Veld"          "Lange"         "Oomen-Ruijten"

This will work only for hyphens

gsub("(^|-)([[:alpha:]])", "\\1\\U\\2", lastname, perl=TRUE)

##[1] "Wortmann-Kool" "Mulder"        "Nistelrooij"   "Camp"         
##[5] "Schaake"       "Veld"          "Lange"         "Oomen-Ruijten"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top