Running HDInsight jobs howto

Question

Ok, there's a lot of questions in there! Here are I hope a few quick answers.

There isn't really a way of scheduling job submission in HDInsight, though of course you can schedule a program to run the job submissions for you. Depending on your workflow, it may be worth taking a look at Oozie, which can be a little awkward to get going on HDInsight, but should help.
On the price front, I would recommend that if you're not using the cluster, you should destroy it and bring it back again when you need it (those compute hours can really add up!). Note that this will lose anything you have in the HDFS, which should be mainly intermediate results, any output or input data held in the asv storage will persist in and Azure Storage account. You can certainly automate this by using the CLI tools, or the rest interface used by the CLI tools. (see my answer on Hadoop on Azure Create New Cluster, the first one is out of date).
I would do this by making sure I only submitted the job once for each file, and rely on Hadoop to handle the retry and reliability side, so removing the need to manage any retries in your application.
Once you have the outputs from your initial processes, if you want to reduce them to a single output for reporting the best bet is probably a secondary MapReduce job with the outputs as its inputs.

If you don't care about the individual intermediate jobs, you can just chain these directly in the one MapReduce job (which can contain as many map and reduce steps as you like) through Job chaining see Chaining multiple MapReduce jobs in Hadoop for a java based example. Sadly the .NET api does not currently support this form of job chaining.

However, you may be able to just use the ReducerCombinerBase class if your case allows for a Reducer->Combiner approach.