Question

I got a data frame where "." is used both as decimal marker and alone as NA.

A    B    C    D
1    .  1.2    6
1   12    .    3
2   14  1.6    4

To work on this data frame I need to obtain:

A    B    C    D
1   NA  1.2    6
1   12   NA    3
2   14  1.6    4

How should I deal to keep decimals but transform alone "." in column C?

Here is the data in a reproducible format:

data <- structure(list(A = c(1L, 1L, 2L), B = c(".", "12", "14"), C = c("1.2", 
    ".", "1.6"), D = c(6L, 3L, 4L)), .Names = c("A", "B", "C", "D"), 
    class = "data.frame", row.names = c(NA, -3L))
Was it helpful?

Solution 2

You can use type.convert and specify "." as your na.string:

df <- data ## Create a copy in case you need the original form
df
#   A  B   C D
# 1 1  . 1.2 6
# 2 1 12   . 3
# 3 2 14 1.6 4

df[] <- lapply(df, function(x) type.convert(as.character(x), na.strings="."))
df
#   A  B   C D
# 1 1 NA 1.2 6
# 2 1 12  NA 3
# 3 2 14 1.6 4

Note that the argument is na.strings (with a plural "s") so you can specify more characters to be treated as NA values if you have any.

Also, the actual answer to this question might be to simply specify the na.strings argument when you are first reading your data into R, perhaps with read.table or read.csv.

Let's replicate the process of reading a csv from within R:

x <- tempfile()
write.csv(data, x, row.names = FALSE)

read.csv(x)
#   A  B   C D
# 1 1  . 1.2 6
# 2 1 12   . 3
# 3 2 14 1.6 4

read.csv(x, na.strings = ".")
#   A  B   C D
# 1 1 NA 1.2 6
# 2 1 12  NA 3
# 3 2 14 1.6 4

OTHER TIPS

Assuming your data frame is data:

data[data == "."] <- NA

should work. Or:

data <- sapply(data, as.numeric)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top