I am doing data preprocessing and am stuck at a problem.I have data like Telma 2525 mg tablet. I want it to be converted to Telma 25 mg tablet.Can this be done?

Thanks

有帮助吗?

解决方案

gusb()

> x<-rep("Telma 2525 mg tablet",10)
> x
[1] "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet"
[6] "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet"

> gsub("Telma 2525 mg tablet","Telma 25 mg tablet",x)

[1] "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet"
[6] "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet"

where x is your data source

EDIT - UPDATED TO MAKE IT GENERIC

d<-data.frame(t=c("blah blah 2525 mg", "blah blah 7272 mg"),stringsAsFactors=F)

remdup<-function(s){
f<-regexec("[0-9]{4}",s)[[1]][1] # find the start point for 4 digits in a row 
sub(substr(s,f,f+1),"",s)        # remove the first match of the first 2 digits
}

lapply(d$t,FUN=function(x)remdup(x))

#[[1]]
#[1] "blah blah 25 mg"
#  
#[[2]]
#[1] "blah blah 72 mg"

其他提示

Solution 1: Replace custom strings with Valid values with Standardization custom string-Telma 2525 mg Valid value-Telma 25 mg

Solution 2: through reference table.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top