First, there are three types of frames.
I (Intra) frames, or keyframes. These frame do not reference any other frames. They are standalone, and can be decoded without any other frame data. Like a JPEG.
P (Predecitve) frame. Can reference frames from the past.
B (bi directional) Can reference frames from the past, or the future.
Option 1. Only use I and P frames. This causes the file to be about 10 - 15% larger (or 10-15% lower quality at the same file size). This is used for interactive systems like video conferencing and screen sharing where latency is very noticeable.
Option 2, wait for the future to happen. at 30 frames per second the future will be here in 33 milliseconds.
h.264 specifically can only reference up to 16 neighboring frames. However most people limit this to around 4. So to wait for 4 frames is only about 133 millisecond delay.