Edit: I should have generated better data. It isn't necessarily the case that the string variable is destring
able. I'm just being lazy here (I don't know how to generate random letters).
I have a data set with a lot of strings that I want to collapse
, but it seems that in general collapse
doesn't place nicely with strings, particularly (firstnm)
and (count)
. Here are some similar data.
clear
set obs 9
generate mark = .
replace mark = 1 in 1
replace mark = 2 in 6
generate name = ""
generate random = ""
local i = 0
foreach first in Tom Dick Harry {
foreach last in Smith Jones Jackson {
local ++i
replace name = "`first' `last'" in `i'
replace random = string(runiform())
}
}
I want to collapse
on "mark", which is simple enough with replace
and subscripts.
replace mark = mark[_n - 1] if missing(mark)
But my collapse
s fail with type mismatch
errors.
collapse (firstnm) name (count) random, by(mark)
If I use (first)
, then the first error clears, but (count)
still fails. Is there a solution that avoids an additional by
operation?
It seems that the following works, but would also be a lot more time-consuming for my data.
generate nonmissing_random = !missing(random)
egen nonmissing_random_count = count(nonmissing_random), by(mark)
collapse (first) name nonmissing_random_count, by(mark)
Or is any solution that facilitates using collapse
the same?