How do I replace <NA> values with zeros in R?

https://stackoverflow.com/questions/23689647

import
r

23-07-2023
|

Pergunta

I have a data.frame and some columns have NA values. I want to replace the <NA>s with zeros. How I do this?

Actually, mydate, shown here, isn't my original data. the original is too large to show here.

mydata = read.spss('mydata.sav', use.value.labels = TRUE, to.data.frame = TRUE, max.value.labels = Inf, trim.factor.names = FALSE, trim_values = FALSE, reencode = "UTF-8")


> mydata
   Q_16_O3 Q_16_O4 Q_16_O5 Q_16_O6 Q_16_O7 Q_16_O8 Q_16_O9
10    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
11    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
12    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
13    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
14    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
15    Trem    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
16    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
17    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
18    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
19    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
20    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>


    str(mydata)

    'data.frame':   11 obs. of  7 variables:

    $ Q_16_O3: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA 4
 NA NA NA NA ...

    $ Q_16_O4: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA
 NA NA NA NA NA ...

    $ Q_16_O5: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA
 NA NA NA NA NA ...

    $ Q_16_O6: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA
 NA NA NA NA NA ...

    $ Q_16_O7: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA
 NA NA NA NA NA ...

    $ Q_16_O8: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA
 NA NA NA NA NA ...

    $ Q_16_O9: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA
 NA NA NA NA NA ...

I would like to use the freq function, so I must not change its structure(mydata).

ps: My problem is <NA>, not NA In the NA case, I have a solution HERE.

Solução

For practically any data structure X containing numerics, use

X[is.na(X)] <- 0

Your question seems slightly discombobulated though - you have indicated that you mean <NA> not NA, without explaining what type <NA> is.

If it is the string "<NA>" you mean, then

X[X=="<NA>"] <- "0"

If you have mixed data types in your data frame, check for that too:

X[is.character(X) & X=="<NA>"] <- "0"

which is strictly more useful in the numeric case.

X[is.numeric(X) & is.na(X)] <- 0

This is a very common idiom for dealing with missing data in R, although you should also look at the parameter na.rm = TRUE which many functions such as mean, sum, &c. will accept.

This strategy will fail for a factor, because you cannot add new factor levels by assigning to the value of a factor. I haven't used read.spss, but looking at the documentation, I suggest you add the use.value.labels = FALSE argument to your call, to avoid creating factors in the first place.

In your specific case, your entire data frame is of the same type (factor). This means it's safe to convert to a character matrix

> class(mydata[[1]])
"factor"
> mydataM <- as.matrix(mydata) 
> mode(mydataM)
"character"

Now you can replace the NA values

X[is.character(X) & X=="<NA>"] <- "0"

In the more general case where you have unwanted factor columns mixed in with other types, you need to do something a little more complex.

myDataM=as.data.frame(lapply(x,
  function(x)if(class(x)=="factor")as.character(x)else x))

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow