Question

I have an SPSS file, but not SPSS. So I want to open it in R.

If I open it using:

library(foreign)
dat <- read.spss("file.sav", to.data.frame=TRUE)

I get the warning

re-encoding from CP1252
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

If I understand correctly, the encoding notification is not a problem (I'm in an UTF-8 locale), but what does the warning about levels mean?

If I open the file using:

dat <- read.spss("file.sav", to.data.frame=TRUE, use.value.labels = FALSE)

the warning disappears, but I'm not sure if what I do is correct.

Also, calling str(dat) gives me output like:

pt_art  : atomic  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "value.labels")= Named chr  "2" "1"
  .. ..- attr(*, "names")= chr  "IPT" "VT"

What does attr(*, "value.labels") mean? I know that "pt_art" means "type of psychotherapy" and "IPT" and "VT" are the two therapy types and "2" and "1" are the numeric codes representing those types, so what we have are what are levels and labels in R, but how do I correctly transfer that into R?

Was it helpful?

Solution

The warning occurs when you try and define a factor with a labels argument that contains duplicate values.

(x <- sample(letters[1:4], 10, replace = TRUE))
##  [1] "b" "c" "d" "d" "b" "c" "d" "c" "c" "c"
factor(x, levels = x)
##  [1] b c d d b c d c c c
## Levels: b c d d b c d c c c
## Warning message:
## In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
##   duplicated levels will not be allowed in factors anymore

SPSS usually uses value labels to denote categorical variables (that should become factors in R). However note this section from the ?read.spss help page.

Occasionally in SPSS, value labels will be added to some values of a continuous variable (e.g. to distinguish different types of missing data), and you will not want these variables converted to factors. By setting 'max.value.labels' you can specify that variables with a large number of distinct values are not converted to factors even if they have value labels. In addition, variables will not be converted to factors if there are non-missing values that have no value label. The value labels are then returned in the '"value.labels"' attribute of the variable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top