R Textmining: How to perform typical textoperations with tm Package on vectors

https://stackoverflow.com/questions/21963888

15-10-2022
|

Question

How, to operate following standard operations, on a character vector?

(Need a dictionary for a DTM (classification). So in order to match the text entries, where this operations were already been made, i have to change the my dictionary terms accoringly.)

library(tm)
dicBin <- c("rosa", "rosig", "grün ", "Blau", "gelb", "lila",  "orange", "pink", "%", "mm", "mp", "*", "monat")

dicBin.corp <- tm_map(dicBin.corp, stemDocument, language = "german") # Initially I hoped that tm_map would work on a vector. Since it doesn't I tried to transform it to Corpus
dicBin.corp <- tm_map(dicBin.corp, stripWhitespace)
dicBin.corp <- tm_map(dicBin.corp, tolower)

Here the dicBin.corp has only "%" inside EDITED

## transform back to a vector
dicBin <- dicBin.corp # How to do also this properly?

Solution

Try applying the functions directly to your character vector:

stemDocument(dicBin, language="german")
stripWhitespace(dicBin)
tolower(dicBin)

To transform a corpus back to a character vector, try

as.character(dicBin.corp)
# [1] "rosa"  "rosig" "grun"  "blau"  "gelb"  "lila"  "orang"
# [8] "pink"  "%"     "mm"    "mp"    "*"     "monat"

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow