Question

I use Hmisc to sign factor names and variable names, and it is very handy. But I found a problem here is the code

a <- c(1,0,1,0,1,0,1,0,1,0)
b <- c("a","b","a","b","a","b","a","b","a","b")
df.new <- data.frame(a,b)
library(Hmisc)
df.new.1 <- upData(df.new,lowernames=TRUE,a=factor(a,labels=c("No","Yes")),b=factor(b,labels=c("No","Yes")))

For character vector give following coding and labels

str(df.new.1$b)

 Factor w/ 2 levels "No","Yes": 1 2 1 2 1 2 1 2 1 2

, which is fine.

When you look for coding and labels using str in first case it gives

str(df.new.1$a)

 Factor w/ 2 levels "No","Yes": 2 1 2 1 2 1 2 1 2 1 ,

which is weird! Original 0 1 coding is gone. How can I fix this problem ? I would like to keep my original 0 1 variable for later regression purposes. Thanks

Was it helpful?

Solution 2

As juba's answer explains, this is the expected way for factors to work. However, if you really want both descriptive factor labels and the original numeric values you can add the values as an attribute of the factor, e.g.,

> a <- c(1,0,1,0,1,0,1,0,1,0)
> tmp <- a
> a <- factor(a, labels=c("No","Yes"))
> attr(a, "values") <- tmp
> a
 [1] Yes No  Yes No  Yes No  Yes No  Yes No 
attr(,"values")
 [1] 1 0 1 0 1 0 1 0 1 0
Levels: No Yes
> str(a)
 Factor w/ 2 levels "No","Yes": 2 1 2 1 2 1 2 1 2 1
 - attr(*, "values")= num [1:10] 1 0 1 0 1 0 1 0 1 0
> attributes(a)$values
 [1] 1 0 1 0 1 0 1 0 1 0
> 

OTHER TIPS

This has nothing to do with Hmisc. It is the way factors are created in base R :

R> a <- c(1,0,1,0,1,0,1,0,1,0)
R> factor(a,labels=c("No","Yes"))
 [1] Yes No  Yes No  Yes No  Yes No  Yes No 
Levels: No Yes
R> str(factor(a,labels=c("No","Yes")))
 Factor w/ 2 levels "No","Yes": 2 1 2 1 2 1 2 1 2 1

As explained in the ?factor help page :

‘factor’ returns an object of class ‘"factor"’ which has a set of integer codes the length of ‘x’ with a ‘"levels"’ attribute of mode ‘character’ and unique (‘!anyDuplicated(.)’) entries. If argument ‘ordered’ is true (or ‘ordered()’ is used) the result has class ‘c("ordered", "factor")’.

So when you use factor on your variable a, the 0 and 1 values are replaced by the "Yes" and "No" you give. Internally, R doesn't manipulate the levels when computing things, but the underlying integer values it has attributed to them. That's why you see the series of 1 and 2 values in the output of str. These integer values are for internal use by R, and you shouldn't really bother with them.

If you want to keep track of your 0 and 1 values, you can either keep them, by keeping your variable as an integer for example, or, if you really need a factor, you can define one with "0" and "1" levels :

R> factor(a,labels=c("0","1"))
 [1] 1 0 1 0 1 0 1 0 1 0
Levels: 0 1

Note that even in this case, you will still get your underlying 1/2 values when using str :

R> str(factor(a,labels=c("0","1")))
 Factor w/ 2 levels "0","1": 2 1 2 1 2 1 2 1 2 1

Another way is to change your levels from "Yes", "No" to "0", "1" directly. You can do it with the levels() function for example :

R> v <- factor(a,labels=c("No","Yes"))
R> v
 [1] Yes No  Yes No  Yes No  Yes No  Yes No 
Levels: No Yes
R> levels(v) <- c("0","1")
R> v
 [1] 1 0 1 0 1 0 1 0 1 0
Levels: 0 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top