Let's say you want to save it in avi. That is a container format, which specifies the stored video stream's meta-data (resolution of the frames, codec information, etc) and the stream itself. There are many places where you can study the format of it, and also RIFF in general.
The other thing is that if you want to play your video file with an "outsider" software, then you have to provide a codec for the system, what the player software will find through your OS (you register the codec somehow, that's OS dependent), and the codec will decode the video stream contained in the avi and supply the frames to the player. (Actually the player calls the decode function with chunks of the stream, and your codec should co-operate).
Of course in your video stream each frame is independent and independently coded from the others. But that doesn't matter from the OS-codec point of view. Your codec has an identifier (FOURCC), which should be set into the AVI codec section, and the player will search the codec using that identifier.
Maybe in your place I may think of motion JPEG. That's the closest existing thing compared to what you do. Only in that case the individual frames are not pngs but jpegs.