Stata: using egen group() to create unique identifiers

https://stackoverflow.com/questions/22674280

22-06-2023
|

Question

I have a dataset where each row is a firm, year pair with a firmid that is a string.

If I do

duplicates drop firmid year, force

it doesn't delete anything since there are no duplicates (I originally created the dataset after running duplicates drop firmid year, force).

So far so good. I want to create a panel which requires a firmid that is numeric. So I run

egen newid = group(firmid)
xtset newid year

But the 'repeated time values in panel' error pops up. Moreover,

duplicates list newid year

lists a whole bunch of duplicates.

It seems as though egen, group() isn't generating unique groups. My question is: why, and how do I create unique groups in a robust way?

Solution

This is an old thread, but I have recently experienced the same symptoms, so I wanted to share my solution. Of course, so long as the questioner does not give further details, we will not know whether the causes are the same for me and him.

The problem turned out to be an issue of precision. As explained here in section 4.4, calculations done on integers stored as floats are precise only in the range up to 16,777,216. So, if you have more than 16,777,216 firms in your sample, rounding error will result in the same ID being assigned to multiple firms. This is straightforwardly dealt with by increasing the precision of the ID variable to long:

egen long newid = group(firmid)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow