Domanda

I have a list of double values and I want to find outliers in it. Does weka provide any algorithm to settle the problem?

È stato utile?

Soluzione

In this work paper (first link), you have full-text implementation of an outlier algorithm with WEKA.

Also, algorithm used is

proposed in the paper, “A Unified Approach to Detecting Spatial Outliers”, by S. Shekhar et al. The paper shows several spatial outlier detection tests. For example, a variogram-cloud displays data points related by neighborhood relationships. For each pair of locations, the square-root of the absolute difference between attribute values at the locations versus the Euclidean distance between the locations are plotted. In data sets exhibiting strong spatial dependence, the variance in the attribute differences data sets exhibiting strong spatial dependence, the variance in the attribute difference will increase with increasing distance between locations. Locations that are near to one another, but with large attribute differences, might indicate a spatial outlier, even though the values at both locations may appear to be reasonable when examining the data set non-spatially. One major drawback of other outlier detection algorithms leads to some true spatial outliers being ignored and some false spatial outliers being identified.

For algorithms for outliers detection, you may have a look at this SIAM tutorial.

Altri suggerimenti

What you probably need to compute is the mean and the standard deviation of the numbers in the list. It should be relatively simple to code these by hand, refer to http://www.mathsisfun.com/data/standard-deviation-formulas.html.

You can also use Apache Math Commons library to do the computation.

package test;

import java.util.Arrays;

public class Main {
    public static void main(String[] args) {
        double[] data = { 20, 65, 72, 75, 77, 78, 80, 81, 82, 83 };
        double[] data1 = null;
        double[] data2 = null;
        if (data.length % 2 == 0) {
            data1 = Arrays.copyOfRange(data, 0, data.length / 2);
            data2 = Arrays.copyOfRange(data, data.length / 2, data.length);
        } else {
            data1 = Arrays.copyOfRange(data, 0, data.length / 2);
            data2 = Arrays.copyOfRange(data, data.length / 2 + 1, data.length);
        }
        double q1 = getMedian(data1);
        double q3 = getMedian(data2);
        double iqr = q3 - q1;
        double lowerFence = q1 - 1.5 * iqr;
        double upperFence = q3 + 1.5 * iqr;
        System.out.println("Lower Fence: " + lowerFence);
        System.out.println("Upper Fence: " + upperFence);
    }

    public static double getMedian(double[] data) {
        if (data.length % 2 == 0)
            return (data[data.length / 2] + data[data.length / 2 - 1]) / 2;
        else
            return data[data.length / 2];
    }
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top