I figured out that vim is pretty good at converting from one encoding to another.
My trick is to parse the file normally, and when the encoding error is encountered just re-encode the file with vim and start parsing again.
Here's the rough idea:
$xmlFile = '/path/to/file.xml';
// Parse the file in a loop
while(...)
{
try
{
// Normal parsing logic...
$reader->readOuterXml();
//...
}
catch(Exception $ex)
{
$encoding = getXMLEncoding($xmlFile) ?: 'utf-8';
exec(sprintf(VIM_PATH . ' -c "set fileencoding=%s" -c "wq" "%s"', $encoding, $xmlFile));
// File has been re-encoded
// The real encoding should now match the declared encoding
// -> Go back to the beginning and parse the file again
}
}
Using this method might garble 1 or 2 chars, but it's way better than completely failed parsing. Ideally the 3rd party would mark their files correctly.
My system is Windows, so the vim arguments might be different on Linux (don't know).