Have you tried an existing image segmentation algorithm?
I would start with the maxflow algorithm for image segmentation by Vladimir Kolmogorov here: http://pub.ist.ac.at/~vnk/software.html In the papers they fix areas of an image to belong to a particular segment, which would help for you problem, but it may not be obvious how to do this in the software.
Deep learning algorithms for parsing scenes by Richard Socher might also help: http://www.socher.org/
And Eric Sudderth has at least one interesting method for visual scene understanding here: http://www.cs.brown.edu/~sudderth/software.html
I also haven't actually used any of this software, which is mostly, if not all, for research and not particularly user friendly.