Question

The following sample data has variables describing bets by a number of players.

How can I calculate each player's first bettype, first betprice, the number of soccer bets, the number of baseball bets, the number of unique prices per customer and the number of unique bet types per username?

clear
input str16 username str40 betdate stake str16 bettype betprice str16 sport
player1 "12NOV2008 12:04:33" 90 SGL 5 SOCCER
player1 "04NOV2008:09:03:44" 30 SGL 4  SOCCER
player2 "07NOV2008:14:03:33" 120 SGL 5 SOCCER
player1 "05NOV2008:09:00:00" 50 SGL 4 SOCCER
player1 "05NOV2008:09:05:00" 30 DBL 3 BASEBALL 
player1 "05NOV2008:09:00:05" 20 DBL 4 BASEBALL 
player2 "09NOV2008:10:05:10" 10 DBL 5 BASEBALL 
player2 "15NOV2008:15:05:33" 35 DBL 5 BASEBALL 
player1 "15NOV2008:15:05:33" 35 TBL 5 BASEBALL
player1 "15NOV2008:15:05:33" 35 SGL 4 BASEBALL
end

generate double timestamp=clock(betdate,"DMY hms") 
format timestamp %tc

generate double dateonly=date(betdate,"DMY hms") 
format dateonly %td

generate firsttype
generate firstprice
generate soccercount
generate baseballcount
generate uniquebettypecount
generate uniquebetpricecount
Was it helpful?

Solution

This is a bit close to the margin, as a "please give me the code" question, with no attempt at your own solutions.

The first type and price are

bysort username (timestamp) : gen firsttype = bettype[1] 
bysort username (timestamp) : gen firstprice = betprice[1] 

The number of soccer and baseball bets is

egen soccercount = total(sport == "SOCCER"), by(username) 
egen baseballcount = total(sport == "BASEBALL"), by(username) 

The number of distinct [not unique!] bet types is

bysort username bettype : gen work = _n == 1 
egen uniquebettypecount = total(work), by(username) 

and the other problem is just the same (but replace work). Another way to do that is

egen work = tag(username bettype) 
egen uniquebettypecount = total(work), by(username) 

What is characteristic of all these variables is that the same value is repeated for all values within each group. For example, firsttype has the same value for each occurrence of each distinct username. Often you will want to use each value just once. A key to that is the egen function tag() just used, for example

egen usertag = tag(username) 

followed by uses of if usertag when needed. (if usertag is a useful idiom for if usertag == 1.)

Some reading suggestions:

On by: http://www.stata-journal.com/sjpdf.html?articlenum=pr0004

On egen: http://www.stata.com/help.cgi?egen

On distinct observations (and why the word "unique" is misleading): http://www.stata-journal.com/sjpdf.html?articlenum=dm0042

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top