For speed, I'd quickly segment the image into surface and non-surface (stuff). So this at least gets you from 24 bits (color?) to 8 bits or even one bit, if you are brave.
From there you need to aggregate and filter the regions into blobs of appropriate size. For that, I'd try a morphological (or linear) filter that is keyed to a disk that would just fit inside the smallest object of interest. This would be an opening and a closing. Perhaps starting with smaller radii for better results.
From there, you should have an image of blobs that can be found and discriminated. Each blob or region should designate the objects of interest.
Note that if you can get to a 1-bit image, things can go VERY fast. However, efficient tools that can make use of this data form (8 pixels per character) are often not forthcoming.