Question

Could someone please suggest how to remove local outliers from the dataframe? I have the code to detect the local outliers, but I need help removing them(setting these values to zero) in the dataframe. Any advice would be highly appreciated.

The code to detect the local outliers is below:

def printOutliers(series, window, scale= 1.96, print_outliers=False):

rolling_mean = series.rolling(window=window).mean()

#Print indices of outliers
if print_outliers:
    mae = mean_absolute_error(series[window:], rolling_mean[window:])#mean absolute error is a measure of difference between two continuous variables. 
    deviation = 3*np.std(series[window:] - rolling_mean[window:])
    lower_bound = rolling_mean - (mae + scale * deviation)
    upper_bound = rolling_mean + (mae + scale * deviation)
    outliers_lower = series[series<lower_bound]
    outliers_upper = series[series>upper_bound]
    print("values beyond lower bound are: " +  "\n"  + str(outliers_lower))
    print("values beyond lower bound are: " + "\n" + str(outliers_upper))  

printOutliers(df['Column1'].dropna(how='any'), 10, print_outliers=True)

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top