Question

For example, I have the data as follows:

month city  sale    company   
1   a   23  sony   
1   a   12  lenovo   
1   b   45  AAA   
1   b   34  BBB   
1   c   67  CCC   
1   c   35  sony   
1   d   65  DDD   
2   a   87  sony   
2   a   65  lenovo   
2   b   67  AAA   
2   b   45  BBB   
2   c   87  CCC   
2   c   54  sony   
2   d   43  DDD   

I sorted the data by

library(doBy)
sort <- orderBy(~month+city+sale,data=data) 

the result should like the data above.

Then I want to extract the line with the largest sale value in each city each month, that is, I should extract the 1,3,5,7,8,10,11,13 lines to a new matrix and export it as a excel file.

How could I do this? The real data is more complicated with thousands of lines.

Was it helpful?

Solution

You can use split-apply-combine to split by company/month pair, find the line you want, and merge the results back together.

Here's how you could do it in base R:

do.call(rbind, lapply(split(dat, paste(dat$month, dat$city)),
                      function(x) x[which.max(x$sale),]))
#     month city sale company
# 1 a     1    a   23    sony
# 1 b     1    b   45     AAA
# 1 c     1    c   67     CCC
# 1 d     1    d   65     DDD
# 2 a     2    a   87    sony
# 2 b     2    b   67     AAA
# 2 c     2    c   87     CCC
# 2 d     2    d   43     DDD

The call to split breaks up the data frame by month/city pairs, lapply extracts the row with the maximum sales, and do.call with rbind puts them all together into a final data frame.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top