I'm trying to come up with a fast algorithm to compute
the quantity b[i]= med |y_i+y_j|, 1<=j!=i<=n
when
the y_1,...,y_n
are sorted already (so b[]
is a vector
of same length as y[]
).
I will assume that all elements of y[]
are unique
and that n is even.
So, the code below computes the b[i]
's the naive (O(n**2)
) way:
(I wrote this in R for convenience, but I'm language agnostic)
n<-30
a_fast<-b_slow<-rep(NA,n)
y<-sort(rnorm(n,100,1))
z<-y
for(i in 1:n){
b_slow[i]<-median(abs(y[-i]+y[i]))
}
I have a tentative proposal --below-- for doing it in O(n)
.
But it only works if y[]
contains positive numbers.
My question is: how should I change the fast algorithm
to also work when y[]
contains both positive and
negative numbers? Is this even possible?
EDIT:
And the code below the (tentative) O(n)
way
(I wrote this in R for convenience, but I'm language agnostic)
tryA<-floor(1+(n-1)/2+1)
tryB<-floor(1+(n-1)/2)
medA<-y[tryA]
medB<-y[tryB]
for(i in 1:(tryA-1)){
a_fast[i]<-medA+y[i]
}
for(i in tryA:n){
a_fast[i]<-medB+y[i]
}
Simple example:
Simple, illustrative example. If we have a vector of length 4
-3, -1, 2, 4
Then, for example for i=1, the 3 absolute pairwise sums are
4 1 1
and their median is 1.
Then, for example for i=2, the 3 absolute pairwise sums are
4 1 3
and their median is 3.
Here is a longer example with both positive and negative y[]
:
-1.27 -0.69 -0.56 -0.45 -0.23 0.07 0.13 0.46 1.56 1.72
and here are my new b_slow[]
(this is the ground thruth, computed the naive way):
1.20 0.92 1.00 1.01 0.79 0.53 0.56 0.53 1.33 1.49
but now, my new a_fast[]
don't match no more:
-1.20 -0.62 -0.49 -0.38 -0.16 -0.16 -0.10 0.23 1.33 1.49
EDIT:
Here is my implementation of Francis's solution (up to the point where we have two sorted array, the median of which is easy to compute). I did it in R to stay in the spirit of the question.
Nonetheless, I seem to be missing a correction factor for the index (the ww in the code below) so the code below is sometimes off by a little bit. This is because in the definition above we compute the medians over n-1 observations (i!=j).
n<-100
y<-rnorm(n)
y<-sort(y)
b<-rep(NA,n)
#Naive --O(n**2)-- approch:
for(i in 1:n){
b[i]<-median(abs(y[-i]+y[i]))
}
k<-rep(NA,n)
i<-1
k[i]<-min(na.omit(c(which(y+y[i]>0)[1],n))) #binary search: O(log(n)) --
for(i in 2:n){ #O(n)
k_prov<-k[i-1]
while(y[k_prov]+y[i]>0 && k_prov>0) k_prov<-k_prov-1
k[i]<-max(k_prov+1,1)
#for(i in 1:n){ should give the same result.
# k[i]<-which(y+y[i]>0)[1]
#}
}
i<-sample(1:n,1)
x1<--y[1:(k[i]-1)]-y[i]
x2<-y[i]+y[n:k[i]]
x3<-c(x1,x2)
plot(x3)
ww<-ifelse(i<k[i] & i>n/2,n/2+1,n/2)
sort(x3)[ww] #this can be computed efficiently: O(log(n))
b[i] #this is the O(n**2) result.