R - read.dta() not recognizing missing values

https://stackoverflow.com/questions/22277811

r
stata

11-06-2023
|

質問

Here's a tiny Stata file: location_missing.dta

There is only one field ("location_state"). Note that I only extracted the missing values from the original data for this question. All of the records are recognized as missing in Stata:

gen test = missing(location_state)
tab test

   test |      Freq.     Percent
------------+-------------------
      1 |      6,098      100.00
------------+-------------------
  Total |      6,098      100.00

However when I use read.dta() from library(foreign) or Stata.file() from library(memisc) to import the data into R, all of the records show up as blanks instead of NA, so functions such as na.omit() don't work. For example:

> library(foreign)
> test <- read.dta("location_missing.dta")
> all(complete.cases(test))
[1] TRUE
# Had to explicitly remove the missing values (blanks):
> test1 <- subset(test, location_state != "")

Saveold in Stata made no difference. Am I missing something, or could this be a bug of some kind?

解決

Stata doesn't distinguish between empty strings and missing values, as stated in the documentation (help missing):

Stata has one string missing value, which is denoted by "" (blank)

It's easy enough to convert the empty strings to NA after importing into R, e.g.,

test[test == ""] <- NA

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow