The problem you describe here is one of music/audio feature extraction and a substantial body of academic work exists that you can draw on. Another useful term of art with which to search is Music Information Retrieval (MIR).
The list of 'features' that researchers have attempted to retrieve from recordings is large and varied, from deterministic things such as pitch and key through emotional characteristics, such as 'energy'.
Most of these turn out to be more difficult than you might imagine, and typically only about 60-70% accurate - although for your requirements, this is probably adequate.
A good entry point might be download Sonic Visualiser, for which a large number of feature extraction plug-ins exist, and are open-source. You'll at least get a feel for what's possible.
Update: Another useful term of art is Onset detection - this is typically used to describe beat detection algorithms.