Question

I am trying to write a function which returns the stem map of words when a text is made to undergo porter stemming. When I tried to run an example, the code wouldn't stop running, i.e no output came. There was no error, but when I force stopped it, it gave warnings like:

1: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
2: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
3: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
4: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
5: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
6: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
7: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
8: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
9: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length

My code is as follows:

stemMAP<-function(text){
  flatText<-unlist(strsplit(text," "))
  textLength<-length(flatText)

  stemList<-list(NULL)
  for(i in 1:textLength){
    wordStem<-SnowballStemmer(flatText[i])
    flagStem=0
    flagWord=0

    for(j in 1:length(stemList)){
      if(regexpr(wordStem,stemList[j][1])==TRUE){

        for(k in 1:length(stemList[j])){
          if(regexpr(flatText[i],stemList[j][k])==TRUE){ 
            flagWord=1
            #break;
            }
         }

        if(flagWord==0){
          stemList[j][length(stemList[j])+1]<-flatText[i]
          #break;
        }

        flagStem=1

      }

      if(flagStem==0){
        stemList[length(stemList)+1][1]<-wordStem
        stemList[length(stemList)+1][2]<-flatText[i]
      }

    }

  }

  return(stemList)
}

How can I identify the mistakes? My test statement was:

stem<-stemMAP("I like being active and playing because when you play it activates your body and this activation leads to a good health")
Was it helpful?

Solution

Here I rewrite your code using the vectorize version of SnowballStemmer. No need to use for.

library(plyr)   
stemMAP<-function(text){
  flatText <- unlist(strsplit(text," "))
  ## here I use the vectorize version
  wordStem <- as.character(SnowballStemmer(flatText))
  hh <- data.frame(ff = flatText,sn = wordStem)
  ## I use plyr to transform the result to a list
  ## dlply : data.frame to list apply
  ## we group the hh by the column sn , and a apply the 
  ## function as.character(x$ff) to each group( x here is subset data.fame)
  stemList <- dlply(hh,.(sn),function(x) as.character(x$ff))
  stemList
}

stemList
$I
[1] "I"

$a
[1] "a"

$activ
[1] "active"     "activates"  "activation"

$and
[1] "and" "and"

$be
[1] "being"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top