Question

As mentioned in other questions, the best way to import a SPSS dataset into R, is to first export the SPSS file to "portable SPSS" format, and then use memisc as follows:

library(memisc) 
mydata <- as.data.set(spss.portable.file("myspss.por"))

But my problem is that NAs are encoded as text (even that I specified the NA values in SPSS)

My solution is to do this for each variable:

mydata$v1[mydata$v1 == "NA"] <- NA

But I have more than 50 variables... ¿do you know a better approach? or ¿do you know what I'm doing wrong in the import?

Was it helpful?

Solution

I found a solution that works for me:

library(memisc)
mydata <- as.data.set(spss.portable.file("myspssdata.por"))
mydata <- as.data.frame(mydata)

When the data.set is converted to data.frame all NA values are correct.

I also tried to obtain a data.frame directly:

mydata <- as.data.frame(spss.portable.file("myspssdata.por"))

But in that way I obtain a data.frame with 0 observations. So it seems that is mandatory to go through the data.set first.

Thanks for your answers.

OTHER TIPS

Just do

is.na(mydata) <- mydata == "NA"

and all "NA"s in all columns are replaced by actual NAs.

I use Hmisc::spss.get to read SPSS and NA values are imported properly:

library(Hmisc)
r <- spss.get(survey_results_file, use.value.labels=T)
> str(r[273,"Q5A8"])
 Factor w/ 4 levels "1 Not Important",..: NA
> is.na(r[273,"Q5A8"])
[1] TRUE
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top