This has nothing to do with Hmisc
. It is the way factors are created in base R :
R> a <- c(1,0,1,0,1,0,1,0,1,0)
R> factor(a,labels=c("No","Yes"))
[1] Yes No Yes No Yes No Yes No Yes No
Levels: No Yes
R> str(factor(a,labels=c("No","Yes")))
Factor w/ 2 levels "No","Yes": 2 1 2 1 2 1 2 1 2 1
As explained in the ?factor
help page :
‘factor’ returns an object of class ‘"factor"’ which has a set of
integer codes the length of ‘x’ with a ‘"levels"’ attribute of mode
‘character’ and unique (‘!anyDuplicated(.)’) entries. If argument
‘ordered’ is true (or ‘ordered()’ is used) the result has class
‘c("ordered", "factor")’.
So when you use factor
on your variable a
, the 0 and 1 values are replaced by the "Yes" and "No" you give. Internally, R doesn't manipulate the levels when computing things, but the underlying integer values it has attributed to them. That's why you see the series of 1 and 2 values in the output of str
.
These integer values are for internal use by R, and you shouldn't really bother with them.
If you want to keep track of your 0 and 1 values, you can either keep them, by keeping your variable as an integer for example, or, if you really need a factor, you can define one with "0" and "1" levels :
R> factor(a,labels=c("0","1"))
[1] 1 0 1 0 1 0 1 0 1 0
Levels: 0 1
Note that even in this case, you will still get your underlying 1/2 values when using str
:
R> str(factor(a,labels=c("0","1")))
Factor w/ 2 levels "0","1": 2 1 2 1 2 1 2 1 2 1
Another way is to change your levels from "Yes", "No" to "0", "1" directly. You can do it with the levels()
function for example :
R> v <- factor(a,labels=c("No","Yes"))
R> v
[1] Yes No Yes No Yes No Yes No Yes No
Levels: No Yes
R> levels(v) <- c("0","1")
R> v
[1] 1 0 1 0 1 0 1 0 1 0
Levels: 0 1