How to prevent Django basic inlines from autoescaping

https://stackoverflow.com/questions/9939248

28-05-2021
|

Question

The Django Basic Inlines app renders a pre-determined template from a pseudo-HTML syntax, based on an app/model/id combination. For example, if you're writing a blog post, you can insert an image that was saved in your image model:

# In the admin
This is the body of my post.

<inline type="media.image" id="1" class="full">

The template then takes a render_inlines filter, which requires to be marked safe so as to render the HTML properly:

# Template
{{ post.body|render_inlines|safe }}

But even with safe, the filter still escapes the HTML, creating <p><img src="..."><p> in the source.

According to the docs, the filter should use mark_safe to prevent autoescaping at the filter level, but the inlines function in parser.py already uses mark_safe.

Is there something that is further needed in Django 1.4 to stop autoescaping at the custom filter layer? I can't seem to get rid of this autoescaping, either at the

inlines function or the
render_inlines function.

I tried using autoescape=None, which didn't seem to help either.

Solution

I maintain a fork of the Inline app. Richard contacted me about this problem and I was able to trace it back to BeautifulSoup, not Django.

The problem was that BeautifulSoup's replaceWith() method was being used to replace the inline markup with the rendered template. The result of render_to_string() is, of course, a string. When replaceWith() receives a string, it turns it into a NavigableString. Since BeautifulSoup expects NavigbleStrings to be strings, it assumes they are unsafe and escapes any HTML characters. The result is that the value being returned by Inline's inlines() function had a bunch of > and < in it rather than < and >.

I didn't notice this problem in Django 1.3,. When I looked, BeautifulSoup was indeed returning escaped HTML. Django's |safe template filter must have been unescaping the previously escaped HTML. In Django 1.4, it no longer does that. (And it shouldn't do that!)

My fix for this is to parse the incoming value with BeautifulSoup and use BeautifulSoup to find all the inline markup, just like before. Rather than using BeautifulSoup's replaceWith() method to replace the inline markup with the rendered inline template, I'm now just using Python's plain old str.replace(). It feels a bit lame to me, converting the parsed soup back to a string and then doing the string replacement. But it works. I'm partly tempted to just do away with BeautifulSoup altogether and find the inline markup with regular expressions but we all know how that ends. If anybody has a better idea, I'm all ears!

The fix was initially implented in this commit. I improved it in the following commit, but apparently StackOverflow is only allowing me to post a maximum of two links, so you'll have to find that one yourself!

OTHER TIPS

Another solution to this is to turn the new code into a BeautifulSoup object, and replaceWith said object. This way beautiful soup seems to behave correctly.

This gives you escaped html:

soup = BeautifulSoup(html_doc)
body = soup.body
new_html = """<p> this is some deap code</p><a href="#">Pointless even</a>"""
body.replaceWith(new_html)

This gives you your html unescapped:

soup = BeautifulSoup(html_doc)
body = soup.body
new_html = """<p> this is some deap code</p><a href="#">Pointless even</a>"""
body.replaceWith(BeautifulSoup(new_html))

It's because of the render_to_string here. Go to inlines/app_model.html and inlines/default.html and add |safe after content variables there.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow