How to average outliers in an R dataset with the previous and following data points?

Question

If you have a list of indices specifying the outliers location in the vector, e.g. using:

out_idx = which(df$value > quan0.99)

You can do something like:

for(idx in out_idx) {
  vec[(idx-1):(idx+1)] = mean(vec[(idx-1):(idx+1)])
}

You can wrap this in a function, making the bandwith and the function an optional parameter:

average_outliers = function(vec, outlier_idx, bandwith, func = "mean") {
   # iterate over outliers
   for(idx in out_idx) {
    # slicing of arrays can be used for extracting information, or in this case,
    # for assiging values to that slice. do.call is used to call the e.g. the mean 
    # function with the vector as input.
    vec[(idx-bandwith):(idx+bandwith)] = do.call(func, out_idx[(idx-bandwith):(idx+bandwith)])
  }      
  return(vec)
}

allowing you to also use median with a bandwith of 2. Using this function:

# Call average_outliers multiple times on itself,
# first for the 0.99 quantile, then for the 0.01 quantile.
vec = average_outliers(vec, which(vec > quan0.99))
vec = average_outliers(vec, which(vec < quan0.01))

or:

vec = average_outliers(vec, which(vec > quan0.99), bandwith = 2, func = "median")
vec = average_outliers(vec, which(vec < quan0.01), bandwith = 2, func = "median")

to use a bandwith of 2, and replace with the median value.