You should use UTF-8
instead utf-8
. See pdftotext
help message:
$ pdftotext -listenc
Available encodings are:
UCS-2
ASCII7
Latin1
UTF-8
ZapfDingbats
Symbol
Proof code:
$ pdftotext -eol unix -nopgbrk -layout -enc utf-8 file.pdf
Syntax Error: Couldn't find unicodeMap file for the 'utf-8' encoding
Command Line Error: Couldn't get text encoding
$ pdftotext -eol unix -nopgbrk -layout -enc UTF-8 file.pdf
$ echo $?
0