We write tests like these in Groovy or Python. Since the Python and Groovy runtimes are largely platform independent you should be able to exec commands on all three OS's. A few parameters might have to change per platform, but you can do an OS check and set those at the start of the script. There are frameworks that simplify the running of the tests like JUnit and Spock for Groovy and Robot for Python, but they just abstract the normal frameworks. I'd start simple. It's Agile to try the simplest thing that could possibly work.
As to you second question I might do both. First I would start writing tests for everything. Then, if it became expensive to run all the tests (let's say more than a couple minutes. I would separate them into Smoke (sanity) and Functional (everything) Tests and run the Functional Tests less often.