Question

I am calculating the overlap of two normal bivariate distribution using the following function

function [ oa ] = bivariate_overlap_integral(mu_x1,mu_y1,mu_x2,mu_y2)
    %calculating pdf. Using x as vector because of MATLAB requirements for integration
    bpdf_vec1=@(x,y,mu_x,mu_y)(exp(-((x-mu_x).^2)./2.-((y-mu_y)^2)/2)./(2*pi));

    %calcualting overlap of two distributions at the point x,y
    overlap_point = @(x,y) min(bpdf_vec1(x,y,mu_x1,mu_y1),bpdf_vec1(x,y,mu_x2,mu_y2));

    %calculating overall overlap area
    oa=dblquad(overlap_point,-100,100,-100,100);

You can see that this involves taking a double integral (x: -100 to 100, y:-100 to 100, ideally -inf to inf but suffices at the moment) from the function overlap_point which is minimum of 2 pdf-s given by the function bpdf_vec1 of two distributions at the point x,y.

Now, PDF is never 0, so I would expect that the larger the area of the interval, the larger will the end result become, obviously with the negligible difference after a certain point. However, it appears that sometimes, when I decrease the size of the interval the result grows. For instance:

>> mu_x1=0;mu_y1=0;mu_x2=5;mu_y2=0;
>> bpdf_vec1=@(x,y,mu_x,mu_y)(exp(-((x-mu_x).^2)./2.-((y-mu_y)^2)/2)./(2*pi));
>>  overlap_point = @(x,y) min(bpdf_vec1(x,y,mu_x1,mu_y1),bpdf_vec1(x,y,mu_x2,mu_y2));
>> dblquad(overlap_point,-10,10,-10,10)
ans =
    0.0124
>> dblquad(overlap_point,-100,100,-100,100)
ans =
    1.4976e-005  -----> strange, as theoretically cannot be smaller then the first answer
>> dblquad(overlap_point,-3,3,-3,3)
ans =
    0.0110  -----> makes sense that the result is less than the first answer as the           
interval is decreased

Here we can check that the overlaps are (close to) 0 at the border points of interval.

>> overlap_point (100,100)
ans =
 0
>> overlap_point (-100,100)
ans =
 0
>> overlap_point (-100,-100)
ans =
 0
>> overlap_point (100,-100)
ans =
 0

Does this perhaps have to do with the implementation of dblquad, or am I making a mistake somewhere? I use MATLAB R2011a.

Thanks

Was it helpful?

Solution

CONGRATULATIONS! You win the award for being the 12 millionth person to ask essentially this question. :) What I'm trying to say is this is an issue that everyone stumbles over at first. Honestly, this question gets asked over and over again, so really, this question should be marked as a dup.

What happens with these things is a bivariate normal is essentially a delta function when viewed from far enough away. And you don't need to really spread that region out too far, since the normal density drops off fast. It is essentially zero over most of the domain you are trying to integrate over, at least to within the tolerances employed.

So then if the quadrature happens to hit some sample points near the areas where there is mass, then you may get some realistic estimate of your integral. But if all the tool sees are numbers that are essentially zero over the entire domain, it concludes the integral is zero. Remember, adaptive integration tools are NOT omniscient. They do not know anything about your function. It is a black box to them. These are NOT symbolic tools.

BTW, this is NOT something I'd expect to see consistently different for Octave versus MATLAB. It is only an issue of the adaptive integrator, and where it chooses to set its sample points down.

OTHER TIPS

OK, here are my octave results.

>fomat long
>z10 = dblquad(overlap_point,-10,10,-10,10)
z10 =  0.0124193303284589
>z100 = dblquad(overlap_point,-100,100,-100,100)
z100 =  0.0124193303245106
>z100 - z10
ans = -3.94835379669001e-012

>z10a = dblquad(overlap_point,-10,10,-10,10,1e-8)
z10a =  0.0124193306514996
>z100a = dblquad(overlap_point,-100,100,-100,100,1e-8)
z100a =  0.0124193306527155
>z100a-z10a
ans = 1.21588676627038e-012

BTW. I've noticed this type of problem before with numerical solutions. Sometimes you make a change that you expect will improve the accuracy of the result (in this case by making your limits closer to the ideal case of the full plane), but instead you get the opposite effect and the result becomes less accurate. What's happening in this case is that by going out "wider", to -100..100, you're shifting the focus away from where the really important action is happening in your function, which is close to the origin. At some point the implementation of dblquad that you're using must start increasing the intersample distance as you increase the limits, and then it starts missing some important stuff close to the origin.

Maybe someone with running a later version of matlab can check this out and see if it's been improved.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top