JPEG has a minimum height of a row, called MCU height. It is 8 lines in images without subsampling (4:4:4 mode) or 16 lines if chroma is subsampled (4:2:0 mode).
If you feed libjpeg these 8 or 16 lines it will be able to process the whole row in one go. Otherwise it'll need to do extra bookkeeping or buffering.
Writing multiple MCU heights at a time, or the whole image, won't hurt.