In one my projects I was gathering face images from camera. But I needed to somehow figure out if faces that are found are similar. I have no recognition model at the beginning, so this means I can't tell opencv to recognize them. No data no classification.
Since I was gathering faces for training some preprocessing to determine labels would be really helpful. So I used relative L1 and sometimes L2 distance to calculate similarity of found faces. Check what relative difference is here
So if you are not talking about object recognition like finding all kind of hammers in the scene which really requires great effort, you can use this norm approach. By using this you can check if your new image close to hammer,ball or etc. Size will be a problem, this could be overcome by using a pyramid approach.
Note that this is even worst than a mid level object recognition approach. But it can be used for simple problems like my preclassification of face images from the camera.