Optimizing a Mass ID3 Tag Scan [duplicate]

Question 1

Beware of premature optimization. Are you really sure that this will be a performance problem? What are your requirements -- how quickly does the script need to run? How fast does it run with the naïve approach? Profile and evaluate before you optimize. I think there's a serious possibility that you're seeing a performance problem where none actually exists.

You can't avoid visiting each file once if you want a guaranteed correct answer. As you've seen, optimizations that entirely skip files will basically amount to automated guesswork.

Can you keep a record of previous scans you've done, and on a subsequent scan use the last-modified dates of the files to avoid re-scanning files you've already scanned once? This could mean that your first scan might take a little bit of time, but subsequent scans would be faster.

If you need to do a lot of complex queries on a music collection quickly, consider importing the metadata of the entire collection into a database (for instance SQLite or MySQL). Importing will take time -- updating to insert new files will take a little bit of time (checking the last-modified dates as above). Once the data is in your database, however, everything should be fairly snappy assuming that the database is set up sensibly.

Question 2

In general for this question i would recommend you using multiple ways of detecting an artist or track title:

1st way to check: Is the filename maybe in ARTIST-TITLE.mp3 format? (or similar)
(filename for this would be "Artist-Track.mp3")

for file in os.listdir(PATH_TO_MP3s):
   artist = re.split("[\_\-\.]", file)[-3]
   track = re.split("[\_\-\.]", file)[-2]
   filetype = re.split("[\_\-\.]", file)[-1]

Of course you have to make sure if the file is in that format first.

2nd step (if first doesn't fit for that file) would be checking if the directory names fit (like you said)

3rd and last one would be to check the ID3 tags.

But make sure to check if the values are the right before trusting it.
For example if someone would use "Track-Artist.mp3" for the code i provided artist and track would be switched.