Formatting in Stata

Question 1

The format only says something about how the data is to be displayed, not how it is stored. In this case the formats are the defaults for the different storage types: FEDRFNDX is stored as an int, while FEDTAXX is stored as a long. You can find out more about the differences by typing in Stata help data_types.

My guess would be that

either both can safely be stored as int without loss of information
or FEDRFNDX only has integer values less than 32,740, which means it does not use the full 8 digits that the codebook reserved for it, while FEDTAXX uses integer numbers larger than 32,740. 32,740 is the largest number that can be stored in a (2 byte) int, while 2,147,483,620 is the limit for a (4 byte) long.

A safe way to check which of these is true is to type compress after loading your dataset. This will change the storage type of each variable to the lowest form possible without loss of information. So, if my first guess is true, it will change the storage type of FEDTAXX to int, while if my second guess is true it will leave the storage type unchanged.

After that it is always a good idea to just type tab FEDTAXX and look at the values. I like the user-written command fre for that, as it displays both the values and the value labels. You can get that by typing in Stata ssc install fre.

Question 2

@Maarten Buis gave an excellent specific answer. The following more general remarks are too long for a comment.

What "format" is and is not in Stata is the subject of several misunderstandings. The best reason for that might be the loose, shifting meaning of "format" across computing. Whatever the reason, format in the specific sense here refers in Stata only to display format. The main way to change the format associated with a variable is through the format command and the help for that command is a good place to start.

Stata evidently surprises many users by making its data types storage types, making them fairly visible to the user and giving some considerable responsibility to the user over choice of storage type. But the connection between storage type and format is at best loose, namely that different storage types have different default formats.

It's crucial to grasp that changing the format in Stata does not change what is being stored.

A test of understanding for intermediate and/or long-term users is to be able to explain what is happening here

. set obs 1
obs was 0, now 1

. gen foo = 2000000001

. di %12.0f foo[1]
2000000000

Why did Stata (appear to) round that large integer? (Clue: This is not a bug, but just Stata following your tacit instructions on storage type.)