So I have a Pandas DataFrame with panel data containing interaction between buyers and sellers on a monthly basis:

       Buyer       Seller       Month            Amount      Amounttotal 
0      Buyer1      Seller1 2009-07-31 00:00:00   10             255
1      Buyer1      Seller2 2009-07-31 00:00:00   15             255
2      Buyer1      Seller3 2009-07-31 00:00:00   120            255
3      Buyer1      Seller4 2009-07-31 00:00:00   110            255 
4      Buyer1      Seller1 2009-08-31 00:00:00   5              427
5      Buyer1      Seller2 2009-08-31 00:00:00   12             427
6      Buyer1      Seller3 2009-08-31 00:00:00   20             427
7      Buyer1      Seller4 2009-08-31 00:00:00   180            427
8      Buyer1      Seller5 2009-08-31 00:00:00   210            427

I have data for multiple sellers , e.g. Buyer1, Buyer2, Buyer3 etc. Amounttotal is the amount the buyer1 has bought for in total during the month. I am looking to calculate, for each buyer in each month, it's 3-firm HHI, meaning the sum of the squared value of the percentage of total monthly volume from the buyers’ three largest interactions. In the example above the 3-firm HHI would be 0,41 for 2009-07 and 0,42 for 2009-08. It seems to me that the calculation will have to involve groupby, however I am trouble figuring out how to find the largest, second largest and third largest value in each groupby element. Help is much appreciated!

有帮助吗?

解决方案

Just take the Amount column, sort it, and take the first 3 elements. And you don't even need the amount total column since you can sum the Amount column.

def compute_hhi(buyer_month):
    total = float(buyer_month['Amount'].sum())
    top_3_amts = buyer_month['Amount'].order(ascending = False)[0:3]
    hhi_elements = [(value/total)**2 for value in top_3_amts]
    hhi = sum(hhi_elements)
    return hhi

grouped = df.groupby(['Buyer','Month'])
hhis = grouped.apply(compute_hhi)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top