Even though you need interpolation to blend one texture into another, you should also try using a GL_TEXTURE_2D_ARRAY and just manually blend the textures together in the fragment shader.
Although it might seem that it's better to take advantage of the automatic hardware magnification filter, bear in mind that for GL_TEXTURE_3D implementation it's far more likely that the neighboring pixels in 6 directions will get cached, and from what I understand you have more emphasis on the individual 2D textures in the set and their neighboring layers, so you're better off having the hardware make 3 cache lines, each in 4 directions.
It's a good idea to try both and see what performs better