Question

Programming a Python web application, I want to create a text area where the users can enter text in a lightweight markup language. The text will be imported to a html template and viewed on the page. Today I use this command to create the textarea, which allows users to enter any (html) text:

my_text = cgidata.getvalue('my_text', 'default_text')
ftable.AddRow([Label(_('Enter your text')),
               TextArea('my_text', my_text, rows=8, cols=60).Format()])

How can I change this so that only some (safe, eventually lightweight) markup is allowed? All suggestions including sanitizers are welcome, as long as it easily integrates with Python.

Was it helpful?

Solution

Use the python markdown implementation

import markdown
mode = "remove" # or "replace" or "escape"
md = markdown.Markdown(safe_mode=mode)
html = md.convert(text)

It is very flexible, you can use various extensions, create your own etc.

OTHER TIPS

You could use restructured text . I'm not sure if it has a sanitizing option, but it's well supported by Python, and it generates all sorts of formats.

This simple sanitizing function uses a whitelist and is roughly the same as the solution of python-html-sanitizer-scrubber-filter, but also allows to limit the use of attributes (since you probably don't want someone to use, among others, the style attribute):

from BeautifulSoup import BeautifulSoup

def sanitize_html(value):
    valid_tags = 'p i b strong a pre br'.split()
    valid_attrs = 'href src'.split()
    soup = BeautifulSoup(value)
    for tag in soup.findAll(True):
        if tag.name not in valid_tags:
            tag.hidden = True
        tag.attrs = [(attr, val) for attr, val in tag.attrs if attr in valid_attrs]
    return soup.renderContents().decode('utf8').replace('javascript:', '')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top