Building a term-document matrix

https://stackoverflow.com/questions/20556992

01-09-2022
|

Question

I have a csv file, whose head is: DATE Alert and OriginatingAddress.
How should I build a term-document matrix based on two columns: DATE and Alert.
The row will have the alert and the column will have the day. The entry indicates the number of occurrences of the alert in a day.

I've tried:

library(tm)
myCorpus <- read.csv("alert-sample-data-4-mining.csv")
corpus <- Corpus(VectorSource(myCorpus$DATE, myCorpus$Alert))
TermDocumentMatrix(corpus)

But the result is not what I want.

The current result I got is:

++++++++++++++++++++++++++++
A term-document matrix (31 terms, 69124 documents)

Non-/sparse entries: 69124/2073720
Sparsity : 97%
Maximal term length: 9
Weighting : term frequency (tf)
++++++++++++++++++++++++++++++++++++++++++++

str(myCorpus)
'data.frame': 69124 obs. of 3 variables:
$ DATEFORMAT : Factor w/ 31 levels "3/01/2013","3/02/2013",..: 21 21 21 21 21 21 21 21 21 21 ...
$ Alert : Factor w/ 88 levels "%BGP-5-ADJCHANGE",..: 49 49 49 49 49 49 49 49 49 49 ...
$ OriginatingAddress: Factor w/ 98 levels "10.112.36.12",..: 67 67 67 67 67 67 67 67 67 67 ...

La solution

I think you're actually asking for a 2-way frequency table, not a term-document matrix:

myCorpus <- read.csv("alert-sample-data-4-mining.csv")
table(myCorpus$Alert, myCorpus$DATEFORMAT)

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow