I put a note here because I have just been thinking about something similar.
Your suggested method (doing a nearest neighbour search during the merge) does seem possible. The issue about there being different sizes between the two clouds being merged does not seem to be an issue if you do a radius search based on some desirable resolution rather than a search for 1 neighbour.
To manage your case 1, you could try merging all the clouds then downsampling with a voxel grid e.g. pcl::VoxelGrid before the triangulation (this would be the easiest way but may not be what you want).
The algorithm encapsulated in pcl::GreedyProjectionTriangulation seems to be mostly described in the below paper [1]. In that paper they also describe an incremental mesh update procedure which is a minor change to the algorithm (they remove triangles close to a new point and start the greedy triangulation again). As far as I know, this has not been implemented in PCL but shouldn't be too difficult. This would correspond to your case 2. However, the mesh you get out would depend on the order in which you merged the clouds. Because it is a time investment I would suggest trying the point-based merging first.
[1] Marton, Z. C., R. B. Rusu, and M. Beetz. 2009. “On Fast Surface Reconstruction Methods for Large and Noisy Point Clouds.” In IEEE International Conference On Robotics and Automation, 3218–3223. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5152628.