The video frames stream is given as 4 bytes per pixel in BGRA format (blue-green-red-alpha) and the pixels are scanned line by line horizontally in the image domain. A full uncompressed frame of size 640x480 has 640x480x4 bytes.
The depth frames stream is given as 2 bytes per depth pixel in unsigned short format. The value of the unsigned shorts represent the distance from the camera plane in millimeters (if you ignore the 4 least significant bits). The 4 least significant bits contain the identity of the player at that particular pixel. A full uncompressed frame of size 320x240 has 320x240x2 bytes.
You can compress the images using standard image compression algorithms in Java using a Java library for the Kinect SDK.