I have a data frame that tracks error codes generated from a script. Every time the script is executed, it adds records to a massive CSV file. The event_id field is a unique to each time the script is run. Each run may add multiple combinations of CRITICAL, ERROR, WARNING, DIAGNOSTIC, or INFORMATION messages with accompanying values and additional information (not represented here for simplicity).

I need to summarize the number of each class of error in the CSV file, but multiple errors from the same event id should only count as one error. Here's an example of how the data is structured:

event_id | class       | value
1        | ERROR       | 5409
1        | ERROR       | 5410
2        | WARNING     | 212
3        | ERROR       | 5409
3        | WARNING     | 400
3        | DIAGNOSTIC  | 64

And this is what I'm looking to get as output. Even though there were three ERROR class events, two of them were associated with the same event, so it only counts as one.

class      | count
ERROR      | 2
WARNING    | 2
DIAGNOSTIC | 1

I did try searching for this, but don't even know what keywords to search for. So even if you aren't able to answer the question, I'd appreciate any help with search queries.

有帮助吗?

解决方案 2

You could split class by event id then create a data frame.

> s <- sapply(split(dat$event_id, dat$class), function(x) length(unique(x)))
> data.frame(count = s)
##            count
## DIAGNOSTIC     1
## ERROR          2
## WARNING        2

其他提示

df = read.table(header = T, sep = "|", text = "
           event_id | class       | value
1        | ERROR       | 5409
1        | ERROR       | 5410
2        | WARNING     | 212
3        | ERROR       | 5409
3        | WARNING     | 400
3        | DIAGNOSTIC  | 64")

df = as.data.table(df)
setkey(df, event_id, class)

unique(df)[, .N, by = class]
#           class N
#1:  ERROR        2
#2:  WARNING      2
#3:  DIAGNOSTIC   1

You could build a 2-d table using the class and event_id variables, use pmin to limit values to 1 in that table, and then use rowSums to get it back to a 1-d table:

rowSums(pmin(table(dat$class, dat$event_id), 1))
# DIAGNOSITIC       ERROR     WARNING 
#           1           2           2 
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top