So after google it for a while I find out a way:
ffmpeg -i input1 -i input2 -filter_complex \
"[0:v]setpts=PTS-STARTPTS, pad=iw*2:ih[bg]; \
[1:v]setpts=PTS-STARTPTS[fg]; [bg][fg]overlay=w; \
amerge,pan=stereo:c0<c0+c2:c1<c1+c3" output
Based on http://ffmpeg.org/pipermail/ffmpeg-user/2013-June/015662.html