Question

I am processing a user input from the public with a javascript WYSIWYG editor and I'm planning on using htmlpurifier to cleanse the text.

I thought it would be enough to use htmlpurifier on the input, stored the cleaned input in the database,and then output it without further escaping/filtering. But I've heard other opinions that you should always escape the output.

Can someone explain why I should need to clean the output if I'm already cleaning the input?

Was it helpful?

Solution

I assume your WYSIWYG editor generates HTML, which is then validated and put in the database. In that case, the validation already took place, so there is no need to validate twice.

As to "escaping output", that's a different matter. You cannot escape the resulting HTML, otherwise you won't have formatted text, and the tags will be visible. Escaping the output is used when you do not want said output to interfere with the markup of the page.

I'd add you have to be very careful with what you allow in your validation phase. You will probably only want to allow a few HTML tags and attributes.

OTHER TIPS

To be 100% safe, use HTMLPurifier twice. Before saving the HTML to DB and before outputting it to screen.
The huge drawback of such solution is performance. HTMLPurifier is ultraslow when filtering HTML and you might encounter longer processing times of your pages.

You should be ok if you perform only 1-2 filterings before outputting something to screen, but if you do 10 filterings per request like we did, we rather decided not to use HTMLPurifier when outputting large amounts of texts to keep.

HTMLPurifier took 60% of processing time per request and we wanted to achieve low response times and higher UX instead.

It depends on your situation. If you can afford using HTMLPurifier before outputting, go for it - it's better and you always have control over what tags you want to allow (for new and even for old content stored in your db).

The mantra always escape your output, which is a Text to HTML conversion, is a good and reasonable default to fall back to when working in the web space. In the case of HTML Purifier, you are specifically breaking this good advice, because you are indeed performing an HTML to HTML conversion and treating the HTML as Text again doesn't really make sense.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top