Well, what they test is nearly but not quite entirely the same. In the first case you can test the button's command (or whether it's disabled or not) without touching the event system. In the second case you test whether the button behaves as it should given an event (which should include invoking the command after you've generated a press and a release).
(ETA: and an Enter event before that as Donal notes in his answer. Sorry I missed that. Also, as I believe he is saying, default bindings don't really need to be tested. The main point of my answer was to discuss "top-down" versus "bottom-up" approaches to selecting test cases.)
If you're being an optimist, you would test by generating events. If you get a successful command invocation after generating the events, all is well. However, if the test fails, you can't be sure which part of the chain failed: the binding, the setting of the command option, the command itself, the state of the button, and so on.
If you're being a pessimist, you would test every link in the chain separately by getting the command option value and invoking it directly, calling invoke on the button, generating the events, and possibly more. If all those tests are successful, all is well, just like above. If any of them fails, you will immediately know what needs to be fixed.
The best strategy is probably achieved by compromise: start with "abstract" testing (in this case, generating events), and add detailed testing if you start having trouble somewhere. Having dozens of tests that just confirm that everything is OK seems impressive but doesn't really add much except inertia which you will have to deal with if the specifications change.