To simplify, when the video is encoded, only certain images (keyframes) are encoded as full image such as your JPEG.
The rest are encoded as a difference between the current image and the next image, which for most scenes is much less in size comparing to the whole image.