Question

I am trying to count the number of transactions which start with AK and also contain AK within the transaction, but don't end in AK.

Examples:

EXCLUDE: example: AK->se (no AK in between)

EXCLUDE: AK->gg->se->ll : does not include AK within the transaction

INCLUDE: example: AK->se->Ak->gg

Sample data:

f<- data.frame(
id=c("A","A","A","A","C","C","D","D","E"),
Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","se->AK->df, hg->pp->sk")
)

I need to deal with a large amount of data so optimization is crucial.

Thanks in advance.

Edited

f<- data.frame(
id=c("A","A","A","A","C","C","D","D","E"),
Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","AK->AK->gg, bishan->AK","AK->se->Ak->gg","se->gr->gg, bishan->AK","AK->AK->df, hg->pp->sk")
)
Was it helpful?

Solution

using regular expression

f<- data.frame(
  id=c("A","A","A","A","C","C","D","D","E"),
  Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","AK->se->AK->gg","se->gr->gg, bishan->AK","se->AK->df, hg->pp->sk")
)

selection = grepl(pattern="^AK->.*AK->",x=f$Mode,perl=TRUE)
f$Mode[selection]
f$id[selection]

using lapply (might be a bit slower if there is a lot of strings)

f<- data.frame(
  id=c("A","A","A","A","C","C","D","D","E"),
  Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","AK->se->AK->gg","se->gr->gg, bishan->AK","se->AK->df, hg->pp->sk")
)

selection = sapply(strsplit(x=f$Mode,split="->"),FUN=function(x) (x[1]=="AK")&(x[length(x)]!="AK")&(sum(x=="AK")>1))
f$Mode[selection]
f$id[selection]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top