문제

I'm trying to get a count of the keywords in my corpus using the R "tm" package. This is my code so far:

# get the data strings
f<-as.vector(forum[[1]])

# replace +
f<-gsub("+", " ", f ,fixed=TRUE)

# lower case
f<-tolower(f)

# show all strings that contain mobile
mobile<- f[grep("mobile", f, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)]
text.corp.mobile <- Corpus(VectorSource(mobile))
text.corp.mobile <- tm_map(text.corp.mobile , removePunctuation) 
text.corp.mobile <- tm_map(text.corp.mobile , removeWords, c(stopwords("english"),"mobile")) 
dtm.mobile <- DocumentTermMatrix(text.corp.mobile)
dtm.mobile 
dtm.mat.mobile <- as.matrix(dtm.mobile)
dtm.mat.mobile

This returns a table with binary results of weather a keyword appeared in one of the corpus texts or not. Instead of getting the final result in a binary form I would like to get a count for each keyword. For example: 'car' appeared 5 times 'button' appeared 9 times

도움이 되었습니까?

해결책

without seeing your actual data, its a bit hard to tell but since you just called DocumentTermMatrix I would try something like this:

dtm.mat.mobile <- as.matrix(dtm.mobile)
word.freqs <- sort(rowSums(dtm.mat.mobile), decreasing=TRUE)
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top