If your'e detecting faces from a video, you can apply a filter on the bounding box to keep the bounding box change smoothly. It will reduce those "inconsistencies" in the face bounding box.
CurrentFrameBoundingBox = a*PrevFrameBoundingBox + (1-a)*DetectedBoundingBox
as a is larger, it will give more weight to the previous frame bounding box and reduce inconsistencies.
You do this for every coordinate in the bounding box.