Finding most frequent term in each document of a corpus

Question 1

Ben's answer gives what you've asked for but I am not sure if what you asked for is wise. It does not account for ties. Here is an approach and a second one using the qdap package. They will give you lists with the words (in qdap's case a list of data frames with words and frequencies. You can use unlist to get you the rest of the way with the first option and lapply, indexing and unlist with qdap. The qdap approach works on the raw Corpus:

Option #1:

apply(dtm, 1, function(x) unlist(dtm[["dimnames"]][2], 
    use.names = FALSE)[x == max(x)])

Option #2 with qdap:

library(qdap)
dat <- tm_corpus2df(crude)
tapply(stemmer(dat$text), dat$docs, freq_terms, top = 1, 
    stopwords = tm::stopwords("English"))

Wrapping the tapply with lapply(WRAP_HERE, "[", 1) makes the two answers identical in content and nearly in format.

EDIT: Added an example that is a leaner use of qdap:

FUN <- function(x) freq_terms(x, top = 1, stopwords = stopwords("English"))[, 1]
lapply(stemmer(crude), FUN)

## [[1]]
## [1] "oil"   "price"
## 
## [[2]]
## [1] "opec"
## 
## [[3]]
## [1] "canada"   "canadian" "crude"    "oil"      "post"     "price"    "texaco"  
## 
## [[4]]
## [1] "crude"
## 
## [[5]]
## [1] "estim"  "reserv" "said"   "trust" 
## 
## [[6]]
## [1] "kuwait" "said"  
## 
## [[7]]
## [1] "report" "say"   
## 
## [[8]]
## [1] "yesterday"
## 
## [[9]]
## [1] "billion"
## 
## [[10]]
## [1] "market" "price" 
## 
## [[11]]
## [1] "mln"
## 
## [[12]]
## [1] "oil"
## 
## [[13]]
## [1] "oil"   "price"
## 
## [[14]]
## [1] "oil"  "opec"
## 
## [[15]]
## [1] "power"
## 
## [[16]]
## [1] "oil"
## 
## [[17]]
## [1] "oil"
## 
## [[18]]
## [1] "dlrs"
## 
## [[19]]
## [1] "futur"
## 
## [[20]]
## [1] "januari"

Question 2

You're almost there, replace max with which.max to get the column index of the term with the highest frequency per document (ie. per row). Then use that vector of column indices to subset the Terms (or column names, kind of) in the document term matrix. That will return the actual term for each document that has the maximum frequency for that document (rather than just the frequency value, as it does when you use max). So, following from your example

maxterms<-apply(dtm, 1, which.max)
dtm$dimnames$Terms[maxterms]
[1] "oil"     "opec"    "canada"  "crude"   "said"    "said"    "report"  "oil"    
 [9] "billion" "oil"     "mln"     "oil"     "oil"     "oil"     "power"   "oil"    
[17] "oil"     "dlrs"    "futures" "january"