Ben's answer gives what you've asked for but I am not sure if what you asked for is wise. It does not account for ties. Here is an approach and a second one using the qdap package. They will give you lists with the words (in qdap's case a list of data frames with words and frequencies. You can use unlist
to get you the rest of the way with the first option and lapply
, indexing and unlist
with qdap. The qdap approach works on the raw Corpus
:
Option #1:
apply(dtm, 1, function(x) unlist(dtm[["dimnames"]][2],
use.names = FALSE)[x == max(x)])
Option #2 with qdap:
library(qdap)
dat <- tm_corpus2df(crude)
tapply(stemmer(dat$text), dat$docs, freq_terms, top = 1,
stopwords = tm::stopwords("English"))
Wrapping the tapply
with lapply(WRAP_HERE, "[", 1)
makes the two answers identical in content and nearly in format.
EDIT: Added an example that is a leaner use of qdap:
FUN <- function(x) freq_terms(x, top = 1, stopwords = stopwords("English"))[, 1]
lapply(stemmer(crude), FUN)
## [[1]]
## [1] "oil" "price"
##
## [[2]]
## [1] "opec"
##
## [[3]]
## [1] "canada" "canadian" "crude" "oil" "post" "price" "texaco"
##
## [[4]]
## [1] "crude"
##
## [[5]]
## [1] "estim" "reserv" "said" "trust"
##
## [[6]]
## [1] "kuwait" "said"
##
## [[7]]
## [1] "report" "say"
##
## [[8]]
## [1] "yesterday"
##
## [[9]]
## [1] "billion"
##
## [[10]]
## [1] "market" "price"
##
## [[11]]
## [1] "mln"
##
## [[12]]
## [1] "oil"
##
## [[13]]
## [1] "oil" "price"
##
## [[14]]
## [1] "oil" "opec"
##
## [[15]]
## [1] "power"
##
## [[16]]
## [1] "oil"
##
## [[17]]
## [1] "oil"
##
## [[18]]
## [1] "dlrs"
##
## [[19]]
## [1] "futur"
##
## [[20]]
## [1] "januari"