문제

I have a data frame that tracks error codes generated from a script. Every time the script is executed, it adds records to a massive CSV file. The event_id field is a unique to each time the script is run. Each run may add multiple combinations of CRITICAL, ERROR, WARNING, DIAGNOSTIC, or INFORMATION messages with accompanying values and additional information (not represented here for simplicity).

I need to summarize the number of each class of error in the CSV file, but multiple errors from the same event id should only count as one error. Here's an example of how the data is structured:

event_id | class       | value
1        | ERROR       | 5409
1        | ERROR       | 5410
2        | WARNING     | 212
3        | ERROR       | 5409
3        | WARNING     | 400
3        | DIAGNOSTIC  | 64

And this is what I'm looking to get as output. Even though there were three ERROR class events, two of them were associated with the same event, so it only counts as one.

class      | count
ERROR      | 2
WARNING    | 2
DIAGNOSTIC | 1

I did try searching for this, but don't even know what keywords to search for. So even if you aren't able to answer the question, I'd appreciate any help with search queries.

도움이 되었습니까?

해결책 2

You could split class by event id then create a data frame.

> s <- sapply(split(dat$event_id, dat$class), function(x) length(unique(x)))
> data.frame(count = s)
##            count
## DIAGNOSTIC     1
## ERROR          2
## WARNING        2

다른 팁

df = read.table(header = T, sep = "|", text = "
           event_id | class       | value
1        | ERROR       | 5409
1        | ERROR       | 5410
2        | WARNING     | 212
3        | ERROR       | 5409
3        | WARNING     | 400
3        | DIAGNOSTIC  | 64")

df = as.data.table(df)
setkey(df, event_id, class)

unique(df)[, .N, by = class]
#           class N
#1:  ERROR        2
#2:  WARNING      2
#3:  DIAGNOSTIC   1

You could build a 2-d table using the class and event_id variables, use pmin to limit values to 1 in that table, and then use rowSums to get it back to a 1-d table:

rowSums(pmin(table(dat$class, dat$event_id), 1))
# DIAGNOSITIC       ERROR     WARNING 
#           1           2           2 
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top