For MapReduce bechmarks, when I finish running them, am I able to know what input/shuffle/output data size are, respectively?

Question

Yes, you can . However, since your question is too broad, I will give you examples for TestDFSIO only, which is designed to measure HDFS data transfer performance.
TestDFSIO support following arguments : -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes] .
Now, before benchmarking read operation , you have to write something which you do with something like hadoop jar hadoop-test-1.2.1.jar TestDFSIO -read -nrFiles 10 -fileSize 100. Here fileSize is your input size for 1 file and multiplying with nrFiles100*10 mb = 1000mb on hdfs. you can find the exact size of output file under /benchmarks/TestDFSIO/io_data directory.
You will some other directories also - io_control (contains file name that was read or written and filesize).
About shuffle - its an intermediate operation. So , to know about it just look the console outputs the time mapreduce was running or you can go do jobtracker's UI to see it.
And, input for write operation is generated by TestDFSIO class. Its just some bytes calculated by mod operation based on buffersize.
You also get a log file generated which contains IO and throughput stats.
Hope this clarifies some of the stuff and gives you a headstart. There are lots of benchmarks which you can explore further.