Question

I'd like to be able to read in the first couple kilobytes of unknown file types and see if it matches any known file types (i.e. mp3 file, jpeg, etc...). I was thinking of trying to load meta data from files from libraries like PIL, sndhdr, py264, etc... and see if they picked up any valid formats but I thought this must have been a problem someone has solved before.

Is there one library or a gist showing the usage of multiple libraries which would do this?

Was it helpful?

Solution

Use python-magic to do the fingerprinting.

The library can determine file type from bytes data only:

import magic
magic.from_buffer(start_data_from_something)

The library provides access to the libmagic file type identification library, which also drives the UNIX file command.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top