So thanks to Stack Overflow I managed to find the post below:
How to get text extraction from PDF to work?
which gave me the information I was looking for. Apparently the PDF's are being generated without the information needed to understand the latin characters.