Question

I have a big data set from a questionary. Importing it from SPSS to R (using SPSS's Stata-Output) gave me the answer to each question as factor.

A question has answers from 1 to 10. However, there are a lot of missing values. R recoginzes them aswell.

However, now I'd like to do some calculations - for example I want to calculate the mean of an answer (not very good statistics, I know, never mind).

So I have to make recode the factors to numerics. I did this with as.numeric().

However, now I have missing values encoded as 11 to 14. Of course I can't calculate any mean like this.

What would be the proper way to recode factors as numerics and tell R to set any value bigger than 10 to NA?

Example: Do you like fish?

    not at all                   very much | don't know  no answer  don't tell
R:  1   2   3   4   5   6   7   8   9   10 |     11          12         13
Was it helpful?

Solution

If you really don't need the missing values, I'd do something like:

a[a>10] <- NA

Then, you can use:

mean(a, na.rm=TRUE)

Alternately, if you want to work around those missing values, you can just use:

mean(a[a<=10])

OTHER TIPS

Let's call your data frame data (you might want to take a copy first). The following would set all values greater than 10 in all columns to NA:

data[data>10]<-NA

The above assumes you've already applied as.numeric.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top