This is a bit close to the margin, as a "please give me the code" question, with no attempt at your own solutions.
The first type and price are
bysort username (timestamp) : gen firsttype = bettype[1]
bysort username (timestamp) : gen firstprice = betprice[1]
The number of soccer and baseball bets is
egen soccercount = total(sport == "SOCCER"), by(username)
egen baseballcount = total(sport == "BASEBALL"), by(username)
The number of distinct [not unique!] bet types is
bysort username bettype : gen work = _n == 1
egen uniquebettypecount = total(work), by(username)
and the other problem is just the same (but replace work
). Another way to do that is
egen work = tag(username bettype)
egen uniquebettypecount = total(work), by(username)
What is characteristic of all these variables is that the same value is repeated for all values within each group. For example, firsttype
has the same value for each occurrence of each distinct username
. Often you will want to use each value just once. A key to that is the egen
function tag()
just used, for example
egen usertag = tag(username)
followed by uses of if usertag
when needed. (if usertag
is a useful idiom for if usertag == 1
.)
Some reading suggestions:
On by:
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004
On egen
: http://www.stata.com/help.cgi?egen
On distinct observations (and why the word "unique" is misleading): http://www.stata-journal.com/sjpdf.html?articlenum=dm0042