You could split
class by event id then create a data frame.
> s <- sapply(split(dat$event_id, dat$class), function(x) length(unique(x)))
> data.frame(count = s)
## count
## DIAGNOSTIC 1
## ERROR 2
## WARNING 2
Domanda
I have a data frame that tracks error codes generated from a script. Every time the script is executed, it adds records to a massive CSV file. The event_id field is a unique to each time the script is run. Each run may add multiple combinations of CRITICAL, ERROR, WARNING, DIAGNOSTIC, or INFORMATION messages with accompanying values and additional information (not represented here for simplicity).
I need to summarize the number of each class of error in the CSV file, but multiple errors from the same event id should only count as one error. Here's an example of how the data is structured:
event_id | class | value
1 | ERROR | 5409
1 | ERROR | 5410
2 | WARNING | 212
3 | ERROR | 5409
3 | WARNING | 400
3 | DIAGNOSTIC | 64
And this is what I'm looking to get as output. Even though there were three ERROR class events, two of them were associated with the same event, so it only counts as one.
class | count
ERROR | 2
WARNING | 2
DIAGNOSTIC | 1
I did try searching for this, but don't even know what keywords to search for. So even if you aren't able to answer the question, I'd appreciate any help with search queries.
Soluzione 2
You could split
class by event id then create a data frame.
> s <- sapply(split(dat$event_id, dat$class), function(x) length(unique(x)))
> data.frame(count = s)
## count
## DIAGNOSTIC 1
## ERROR 2
## WARNING 2
Altri suggerimenti
df = read.table(header = T, sep = "|", text = "
event_id | class | value
1 | ERROR | 5409
1 | ERROR | 5410
2 | WARNING | 212
3 | ERROR | 5409
3 | WARNING | 400
3 | DIAGNOSTIC | 64")
df = as.data.table(df)
setkey(df, event_id, class)
unique(df)[, .N, by = class]
# class N
#1: ERROR 2
#2: WARNING 2
#3: DIAGNOSTIC 1
You could build a 2-d table using the class
and event_id
variables, use pmin
to limit values to 1 in that table, and then use rowSums
to get it back to a 1-d table:
rowSums(pmin(table(dat$class, dat$event_id), 1))
# DIAGNOSITIC ERROR WARNING
# 1 2 2