Question

I have two columns in a data.frame d of character vectors

t1 <- c("vector, market", "phone34, fax", "material55, animal", "cave", "monday", "fast98")
t2 <- c("vector, market", "phone, fax", "summer, animal", "pan23", "monday", "fast98, ticket")

d <- data.frame(t1, t2, stringsAsFactors=FALSE)

d
                  t1             t2
1     vector, market vector, market
2       phone34, fax     phone, fax
3 material55, animal summer, animal
4               cave          pan23
5             monday         monday
6             fast98 fast98, ticket

I want to concatenate the two columns to a single column t3, without any duplication.

Using paste alone gives me duplicates.

d$t3 <- paste(d$t1, d$t2, sep=", ")

> d
                  t1             t2                                 t3
1     vector, market vector, market     vector, market, vector, market
2       phone34, fax     phone, fax           phone34, fax, phone, fax
3 material55, animal summer, animal material55, animal, summer, animal
4               cave          pan23                        cave, pan23
5             monday         monday                     monday, monday
6             fast98 fast98, ticket             fast98, fast98, ticket

The desired result will be

                  t1             t2                                 t3
1     vector, market vector, market                     vector, market
2       phone34, fax     phone, fax                phone34, phone, fax
3 material55, animal summer, animal         material55, animal, summer
4               cave          pan23                        cave, pan23
5             monday         monday                             monday
6             fast98 fast98, ticket                     fast98, ticket

How can I efficiently do this in R? Is there a vectorized solution?

Was it helpful?

Solution

You need to strsplit each entry of each vector, do a union of the resulting vectors, and paste them together:

strsplit(d$t1, split=", ") -> t1s   ## list of vectors
strsplit(d$t2, split=", ") -> t2s   ## list of vectors

# do a union of the elements and paste them together to get a single string
d$t3 <- sapply(1:length(t1), function(x) paste(union(t1s[[x]], t2s[[x]]), collapse=", "))

I hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top