Beware of premature optimization. Are you really sure that this will be a performance problem? What are your requirements -- how quickly does the script need to run? How fast does it run with the naïve approach? Profile and evaluate before you optimize. I think there's a serious possibility that you're seeing a performance problem where none actually exists.
You can't avoid visiting each file once if you want a guaranteed correct answer. As you've seen, optimizations that entirely skip files will basically amount to automated guesswork.
Can you keep a record of previous scans you've done, and on a subsequent scan use the last-modified dates of the files to avoid re-scanning files you've already scanned once? This could mean that your first scan might take a little bit of time, but subsequent scans would be faster.
If you need to do a lot of complex queries on a music collection quickly, consider importing the metadata of the entire collection into a database (for instance SQLite or MySQL). Importing will take time -- updating to insert new files will take a little bit of time (checking the last-modified dates as above). Once the data is in your database, however, everything should be fairly snappy assuming that the database is set up sensibly.