Implementing Bin Fu's approximate sum algorithm

https://stackoverflow.com/questions/22127706

18-10-2022
|

Question

I'm trying to implement Bin Fu's approximate sum algorithm in a real language to have a better feel of how it works.

In a nutshell, this is an algorithm to compute efficiently $(1+\epsilon)$-bounds on the value of $s(x)=\sum_{i=1}^n x_i$ where $x$ is a vector of sorted floats.

However, I must be doing something wrong because running the algorithm results in a bug (I'm also not very versed in pseudo algorithm language and some things like array bound checking seem to be implicit in this code).

Here is the non-working code I have so far and any hints/help with the problem would be welcome --I'm language agnostic, I just used R because it is a 1-index (the algo is 1-index) open source interpreted language:

ApproxRegion<-function(x,n,b,delta){
    if(x[n]<b)  return(NULL)
    if(x[n-1]<b)    return(c(n,n))
    if(x[1]>=b) reurn(c(1,n))
    m1<-2
    while(x[n-m1**2+1]>=b)  m1<-m1**2
    i<-1
    m1<-m1
    r1<-m1
    while(m1>(1+delta)){
        m1<-sqrt(m1)
        if(x[n-floor(m1*r1)+1]>=b){
            r1<-m1*r1
        } else {
            r1=r1
        }
        i=i+1
    }
    return(c(n-floor(r1*m1)+1,n))
}       
ApproxSum<-function(x,n,epsilon){
    if(x[n]==0) return(0)
    delta<-3*epsilon/4
    r1p<-n
    s<-0
    i<-1
    b1<-x[n]/(1+delta)
    while(b1>=((delta*x[n])/(3*n))){
        Ri<-ApproxRegion(x=x,n=r1p,b=b1,delta=delta)
        r1p<-Ri[1]-1
        b1<-x[r1p]/(1+delta)
        s1<-(Ri[2]-Ri[1]+1)*b1
        s<-s+s1
        i<-i+1
    }
    return(s)
}
n<-100;
x<-sort(runif(n));
ApproxSum(x=x,n=length(x),epsilon=1/10);
sum(x)

The author mentions a c++ version but I couldn't find it online (any help on front would also be good).

Modo: I put the question here (rather than at the theoretical CS stackexchange site) because it's about an implementation problem. Feel free to move.

EDIT

The original code had an 'hairy' exit condition (x[i]=$-\infty$ for $i\leq 0$). Following Martin Morgan's suggestion, I replaced the occurrences of this by a proper break, yielding the following code:

ApproxRegion<-function(x,b,delta,n){
    if(n<=1)            return(NULL)
    if(x[n]<b)          return(NULL)
    if(x[n-1]<b)            return(c(n,n))
    if(x[1]>=b)         return(c(1,n))
    m<-2
    xit<-0
    while(!xit){
        if(n-m**2+1<1)      break
        if(x[n-m**2+1]<b)   break
        m<-m**2
    }
    i<-1
    r<-m
    while(m>=(1+delta)){
        m<-sqrt(m)  
        if(n-floor(m*r)+1>0){
            if(x[n-floor(m*r)+1]>=b)    r=m*r   
        }   
        i<-i+1      
    }
    return(c(n-floor(m*r)+1,n))
}       
ApproxSum<-function(x,n,epsilon){
    if(x[n]==0) return(0)
    delta=3*epsilon/4
    rp<-n
    s<-0
    i<-1
    b<-x[n]/(1+delta)
    while(b>=delta*x[n]/(3*n)){
        R<-ApproxRegion(x,b,delta,rp)
            if(is.null(R))  break   
        if(R[1]<=1) break
        rp<-R[1]-1
        b<-x[rp]/(1+delta)
        si<-(R[2]-R[1]+1)*b
        s<-s+si
        i<-i+1
    }
    return(s)
}

Now, it works:

n<-100;
set.seed(123)
x<-sort(runif(n));
ApproxSum(x=x,n=length(x),epsilon=1/10);
sum(x)

Solution

By way of a partial answer... There are edge conditions that are not handled explicitly by the algorithm. For instance in ApproxRegion one needs to guard againt n = 0 (return value should be NULL?) or 1 (c(n, n)?) otherwise the first or second conditions x[n] < b, x[n - 1] < b will not evaluate as expected (e.g., x[0] returns numeric(0)). Likewise the test in the loop has to guard against m1**2 > n + 1, otherwise you'll subscript by a negative number.

I think there are similar issues in ApproxSum, particularly when ApproxRegion returns, e.g., c(1, 1) (hence r1p == 0, b1 = integer()). It would be interesting to see an updated implementation.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow