Question

I have dataset with 2 columns, I would like to clean up my dataset by using gsub such as

Data_edited_txt2 <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", Data_edited_txt2$text)
Data_edited_txt2 <- gsub("@\\w+", " ", Data_edited_txt2$text)
Data_edited_txt2 <- gsub("[[:punct:]]", "", Data_edited_txt2$text) 

I would get an error :" $ operator is invalid for atomic vectors" at the second run of gsub and I noticed the 2nd column will disappear after running the first gsub.

Please advise how to perform all the gsub, but keeping the 2nd column?

structure(list(text = structure(c(1L, 3L, 7L, 4L, 2L, 5L, 6L), .Label = c("@airasia im searching job", 
"@AirAsia no flight warning for cebu outbound?", "@shazzr1 @AirAsia never mind.. now everyone can fly.", 
"@TigerAir confirmed as having far nastier policies and uncaring customer service than @airasia who I will now fly every time in preference.", 
"@Wingmates Since your taxes is HIGHER than other airlines but your service is really BAD because always change and cancel the flight.", 
"hai MASwings @Wingmates . Bilakah tempoh promosi anda? Saya ingin terbang ke Palawan dengan bajet yang agak rendah :3", 
"One thing I \"like\" about @AirAsia is, DELAY."), class = "factor"), 
created = structure(c(3L, 2L, 1L, 7L, 6L, 4L, 5L), .Label = c("2/2/2014 11:30", 
"2/2/2014 11:32", "2/2/2014 12:18", "24/2/2014 4:03", "29/3/2014 8:21", 
"30/1/2014 16:02", "31/1/2014 8:13"), class = "factor")), .Names = c("text", 
"created"), class = "data.frame", row.names = c(NA, -7L))
Was it helpful?

Solution

You override the whole data frame instead of only one column. Try this:

Data_edited_txt2$text <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", Data_edited_txt2$text)
Data_edited_txt2$text <- gsub("@\\w+", " ", Data_edited_txt2$text)
Data_edited_txt2$text <- gsub("[[:punct:]]", "", Data_edited_txt2$text) 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top