Is effective usage of hashes of changing audio (mp3) files possible

Question 1

Answering your questions

file path should not be used when computing hash. Neither filename nor extension.
rehashing after each ID3 tag write would solve your problem provided that all changes occur in your application
hash can safely be used as a key for your purposes (see below)
probably yes, if I understand you correctly

Possibility of repeated hash value

Depending on the hashing function you choose, if you search, you will find/generate another file with the same hash in year, millenium, billion years or you will not do it till the end of the world.

It's all a matter of probabilities. Check details of each hashing function to learn how low the probability of finding another file with the same hash is.

Problem of changed tags in mp3 files

While this may be a problem, what you need to do is hash only the part of file that is not the ID3 tag. They are usually located at the end of the file and take a very small percent of the file size.

What you can do is to use the hashing funciton on the part of the file that will not be changing. Just skip the last N bytes of a file when hashing.

Question 2

Yes, if you hash the file contents, then even if the file moves somewhere else, it will still result in the same hash when you do it again. So yes, you can totally identify files based on their content’s hash value (this is what Git does for example). As for creating a hash of a file, there are several questions that will tell you how to do it, for example this one.

Note though that due to ID3 tags and stuff, your files are not immutable, so hashing on the file contents might not be the best idea after all. If you change the tags of a file, its hash will change, resulting in a new file (at least for your application). Of course, if you change the tags within your application, then you can easily take track of those changes and update the old record to use the new hash. The same idea could be applied to identifying the file based on its path though too (if you move it within your application, you could just update its path in the database as well). The problem though is that both these actions are likely to happen outside of your application.

So both identification methods (hash of file contents, or file path) are somewhat flawed, but there is no real alternative for identifying the file.

Question 3

Hashing will work for you. It basically creates a checksum based on all bytes in the file. Using a good hash will give you a signature for each file which is unique (there is more chance of winning the lottery five times in a row as finding two files which are different with the same hash).

Problem is you need to read the entire file to calculate the hash. This might hurt performance a bit.

So on rediscorvery you might want to first check if the filesize is the same. If not there is no need to read the entire file and calculate the hash. But you need to store filesize and hash for that.

Some info on hashing (using the MD5 method)

http://www.fastsum.com/support/md5-checksum-utility-faq/md5-hash.php