Question

I was wondering if you know about any good and accurate PHP library or file I can include in my script in order to analyse the content of X file and then check if it is an especific type like .doc, .docx .jpg, etc.

I know PHP offers a big number of libraries that we could use to check them, but they're not that accurate at all, some just checks the file extension or the file header (they don't even know if the file is broken or not)

What I request is for something very accurate, simple and faster (probably I'm requesting too much) but any link or suggestion will be accepted and appreciated, Thank you!

Was it helpful?

Solution

As far as I know, no such library exists; it also wouldn't make sense to have one.

let's say I have jpeg image I would like to analyse, the headers probably would be okay but the image itself is broken, and when I want to convert them or cut them for thumbnails (with the GD library which is the one I use) the functions (mostly imagecreatefromjpeg) will throw me errors, and in order to create a good thumbnail I need a valid image.

The best place to catch a malformed JPG file with malformed headers is when GD errors out while trying to process it. Just deal with that in a transparent and useful way (= let the user know that something went wrong). Why add extra code that would essentially have to do the same thing?

By handling the error when it occurs, you can also catch issues that a simple analysis of the file wouldn't reveal anyway - for example, GD can't deal with CMYK JPGs. Still, CMYK JPGs are perfectly valid files. Another example is files that are too big to be processed on your server.

Of course, you can do header or size checks beforehand on every uploaded file. But a separate check that goes as deeply as you want it doesn't make sense.

Apart I would like to have it to prevent virus or code injection..

This isn't a realistic goal. What if the library you open the file with to check it is vulnerable to the injection?

Also, injections like this are very rare; library vulnerabilities tend to be widely publicized, and patches quickly provided. Just keep your machine up to date.

If you really need enterprise-grade virus protection, get a server-side virus detection product.

OTHER TIPS

What i did for this was to open the file, read it, and search for the file headers. most of them are available in their wikipedia format definition.

%PDF for pdf, first 4 chars. %PNG for png, first 4 chars.

Havent seen yet a library to do that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top