I think there is a logical and a technical flaw in your code:
The logical one:
You are using one variable zipFile although you have different files depending on the x and zoom parameter. So even in case zipFile != null, it may contain the wrong content.
If you have e.g. loaded the file for lets say x=1 and zoom=1, you will continue to use this file, even if zoom changed to 2. So just checking for null is not sufficient.
The technical one:
TileProvider must be threadsafe.
This is not the case in your coding. You may just add a synchronized to your readTileImage method. But this will probably slow down the whole loading process and it still would not work without resetting the stream each time readTileImage is called. So the "caching" of the stream is not of much help.
I think the way you organize the tiles, by having all y-coordinates for the same x and zoom level in the same file makes things that complicated. The map will try to load tiles for different x and y values in parallel. I do not see, how this can be accomplished using the same stream. I would keep a separate file for each x, y, and zoom triple.