I'd try to perform a high-pass filtering of your 2d-data.
According to Fourier, every signal can be transformed to "frequency space" by analyzing which frequencies are in the signal. This also applies to 2d-signals, like images.
By the means of a "high-pass-filter", you remove all low-frequency parts, like constant offsets and slow gradients. If applied to an image it can serve as a simple "edge detection" algorithm. Looking at a sample might make it easier to understand:
I took an image of a spider on a wall from somewhere on the web (top-left). I then decreased the brightness of this image (lower-left). For both versions, I applied a high-pass filter using GIMP (This plugin). For both input images, the output looks very similar.
My recommendation: First apply a high-pass filter, then look at differences.
Possible problems
As requested, here are some problems that I can imagine.
No sharp edges: if the object you want to detect doesn'T have sharp edges you probably will filter it out using HF-pass filtering. But what objects could that be? They must be huge, flat (to not produce shadows) and unstructured.
Only color differs, not brightness: If the object only differs in term of its color, but the brightness is the same as the background, the grayscale-conversion might be a problem. But if you run into this problem, just analyse the R, G, B-data separately, then at least one channel should help detecting the object - otherwise, you can't see it anyway.
Edit As reply to ???, if you also adjust the levels of the high-pass filtered image (which of course is all around 0.5*256) by just normalizing it to the range 0, 256 again you get
Which probably isn't worse than your result. But, HP-filters are simple and, when using FFT, very fast.