PHP DOMDocument : loadHTMLFile choking on a mysterious character: RS

https://stackoverflow.com/questions/798232

18-09-2019
|

Question

Using php's DOMDocument->LoadHTMLFile('test.html'); keeps on returning an error to me, reporting for an error in the content at line 36. Deleting character after character, it turns out it's an apparently empty space that was the culprit. Copying/pasting that sentence in another editor (Editra), showed a strange RS character.

What is it, and more importantly, how can i avoid it from happening again ?

Solution

It's a Record separator

Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it.

SEQ: ^^ - Dec: 30 - Hex: 1E - Acronym: RS

What you can do is use strtr() to strip away non visible characters. An example by Joel Degan on PHP.net should get you on your way.

OTHER TIPS

As I recall, PHP is throwing a non-fatal error in this case. It will complain about a lot of things, which you can't do anything about if the file is not created by you. What you can do, is use bad programming practices and suppress the errors by putting @ before the command.

@DOMDocument->LoadHTMLFile('test.html');

It should still load the file, but you will be "ignoring" the errors. Ignorance is bliss?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow