While implementing improvements to quicksort partitioning,I tried to use Tukey's ninther to find the pivot (borrowing almost everything from sedgewick's implementation in QuickX.java)
My code below gives different results each time the array of integers is shuffled.
import java.util.Random;
public class TukeysNintherDemo{
public static int tukeysNinther(Comparable[] a,int lo,int hi){
int N = hi - lo + 1;
int mid = lo + N/2;
int delta = N/8;
int m1 = median3a(a,lo,lo+delta,lo+2*delta);
int m2 = median3a(a,mid-delta,mid,mid+delta);
int m3 = median3a(a,hi-2*delta,hi-delta,hi);
int tn = median3a(a,m1,m2,m3);
return tn;
}
// return the index of the median element among a[i], a[j], and a[k]
private static int median3a(Comparable[] a, int i, int j, int k) {
return (less(a[i], a[j]) ?
(less(a[j], a[k]) ? j : less(a[i], a[k]) ? k : i) :
(less(a[k], a[j]) ? j : less(a[k], a[i]) ? k : i));
}
private static boolean less(Comparable x,Comparable y){
return x.compareTo(y) < 0;
}
public static void shuffle(Object[] a) {
Random random = new Random(System.currentTimeMillis());
int N = a.length;
for (int i = 0; i < N; i++) {
int r = i + random.nextInt(N-i); // between i and N-1
Object temp = a[i];
a[i] = a[r];
a[r] = temp;
}
}
public static void show(Comparable[] a){
int N = a.length;
if(N > 20){
System.out.format("a[0]= %d\n", a[0]);
System.out.format("a[%d]= %d\n",N-1, a[N-1]);
}else{
for(int i=0;i<N;i++){
System.out.print(a[i]+",");
}
}
System.out.println();
}
public static void main(String[] args) {
Integer[] a = new Integer[]{17,15,14,13,19,12,11,16,18};
System.out.print("data= ");
show(a);
int tn = tukeysNinther(a,0,a.length-1);
System.out.println("ninther="+a[tn]);
}
}
Running this a cuople of times gives
data= 11,14,12,16,18,19,17,15,13,
ninther=15
data= 14,13,17,16,18,19,11,15,12,
ninther=14
data= 16,17,12,19,18,13,14,11,15,
ninther=16
Will tuckey's ninther give different values for different shufflings of the same dataset? when I tried to find the median of medians by hand ,I found that the above calculations in the code are correct.. which means that the same dataset yield different results unlike a median of the dataset.Is this the proper behaviour? Can someone with more knowledge in statistics comment?