Вопрос

I have data.frame with 2 columns and thousands rows of random strings as:

Column1                      Column2
"this is done in 1 hour"     "in 1 hour" 

I would like to get a new data.frame column like this:

Column3
"this is done" 

So basically match the string according to the Column2 and get the remaining of Column1. How to approach this?

EDIT:

This would not solve the issues since the length of strings varies so I can't do:

substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}

substrRight(x, 3)

So I would need something like grepl matching.

Это было полезно?

Решение

You can do it with regular expression:

data <- data.frame(Column1 = "this is done in 1 hour", Column2 = "in 1 hour")
data$Column3 <- gsub(data$Column2, '', data$Column1) # Replace fist parameter by second in third.

EDIT: For more than 1 row, you can use mapply:

data <- data.frame(Column1 = c("this is done in 1 hour", "this is a test"), Column2 = c("in 1 hour", "a test"))
data$Column3 <- mapply(gsub, data$Column2, '', data$Column1)

Другие советы

Here is an example of how you could do it:

# example data frame
testdata <- data.frame(colA=c("this is","a test"),colB=c("is","a"),stringsAsFactors=FALSE)

# adding the new column
newcol <- sapply(seq_len(nrow(testdata)),function(x) gsub(testdata[x,"colB"],"",testdata[x,"colA"],fixed=TRUE))
new.testdata <- transform(testdata,colC=newcol)

# result
new.testdata
#      colA | colB  | colC
# --------------------------
# 1 this is |   is  | th 
# 2  a test |    a  |   test


EDIT: gsub(str1,'',str2,fixed=TRUE) deletes all occurrences of str1 within str2 whereas using sub would only delete the first occurrence. Since str1 is usually interpreted as regular expression, it is important to set fixed=TRUE. Otherwise a mess happens if str1 contains characters such as .\+?*{}[]. To address the comment, the following would replace only the last occurrence of str1 in str2 leading to the desired output:

revColA <- lapply(testdata[["colA"]],function(x) paste0(substring(x,nchar(x):1,nchar(x):1)))
revColA <- lapply(revColA,paste,collapse='')
revColB <- lapply(testdata[["colB"]],function(x) paste0(substring(x,nchar(x):1,nchar(x):1)))
revColB <- lapply(revColB,paste,collapse='')

revNewCol <- sapply(seq_len(nrow(testdata)),function(x) sub(revColB[x],"",revColA[x],fixed=TRUE))
newcol <- lapply(revNewCol,function(x) paste0(substring(x,nchar(x):1,nchar(x):1)))
newcol <- sapply(newcol,paste,collapse='')

new.testdata <- transform(testdata,colC=newcol)

### output ###
#        colA   colB   colC
------------------------
# 1  |this is |   is | this 
# 2  | a test |   a  |   test
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top