Question

How to count the total number of transaction by id and by date ?

Sample data :

f<- data.frame(
id=c("A","A","A","A","C","C","D","D","E"),
start_date=c("6/3/2012","7/3/2012","7/3/2012","8/3/2012","5/3/2012","6/3/2012","6/3/2012","6/3/2012","5  /3/2012")

)

Excepted Output:

id | count
 A |  3     
 C |  2
 D |  1
 E |  1

Logic :

As A is 6 MARCH , 7 MARCH AND 8 MARCH SO COUNT 3

C is 5 MARCH , 6 MARCH SO COUNT 2

so on...

I Tried with the following code , and I think it only count the number of the ID occurred in the data.

library(lubridate)
f$date <- mdy(f$Date)
f1 <- s[order(f$id, f$Date), ]

How can I implement this code to get my desire outcome?

[Note: The actual data is in huge volume, so optimization need to be consider.]

Thanks in advance.

Was it helpful?

Solution 2

You can try. f[!duplicated(f), ] removes duplicates from f and then aggregate does the aggregation using length function i.e. gives count of start_date for each id

aggregate(start_date ~ id, f[!duplicated(f), ], length)
##   id start_date
## 1  A          3
## 2  C          2
## 3  D          1
## 4  E          1

OTHER TIPS

I'm getting a different answer:

with(f, tapply(start_date, id, length))
A C D E 
4 2 2 1 

Not sure what format you want the results in, but

rowSums(with(f, table(id, start_date)>0))

will return a named vector with the count of distinct days for each ID.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top