Frage

I have a data frame with missing values:

X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62

I want to impute the NA values linearly from the known values so that the dataframe looks:

X   Y    Z
54  57  57
100 58  58
90  59  57.5
80  60  57
70  61  56.5
60  62  56
63  62  58
66  62  60
69  60  62

thanks

War es hilfreich?

Lösung

Base R's approxfun() returns a function that will linearly interpolate the data it is handed.

## Make easily reproducible data
df <- read.table(text="X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62", header=T)

## See how this works on a single vector
approxfun(1:9, df$X)(1:9)
# [1]  54 100  90  80  70  60  63  66  69

## Apply interpolation to each of the data.frame's columns
data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
#     X  Y    Z
# 1  54 57 57.0
# 2 100 58 58.0
# 3  90 59 57.5
# 4  80 60 57.0
# 5  70 61 56.5
# 6  60 62 56.0
# 7  63 62 58.0
# 8  66 62 60.0
# 9  69 62 62.0

Andere Tipps

I can recommend the imputeTS package, which I am maintaining (even if it's for time series imputation)

For this case it would work like this:

library(imputeTS)
df$X <- na_interpolation(df$X, option ="linear")
df$Y <- na_interpolation(df$Y, option ="linear")
df$Z <- na_interpolation(df$Z, option ="linear")

As mentioned the package requires time series / vector input. (that's why each column has to be called separately)

The package offers also a lot of other imputation functions like e.g. spline interpolation.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top