Lightweight markup language for Python
Question
Programming a Python web application, I want to create a text area where the users can enter text in a lightweight markup language. The text will be imported to a html template and viewed on the page. Today I use this command to create the textarea, which allows users to enter any (html) text:
my_text = cgidata.getvalue('my_text', 'default_text')
ftable.AddRow([Label(_('Enter your text')),
TextArea('my_text', my_text, rows=8, cols=60).Format()])
How can I change this so that only some (safe, eventually lightweight) markup is allowed? All suggestions including sanitizers are welcome, as long as it easily integrates with Python.
Solution
Use the python markdown implementation
import markdown
mode = "remove" # or "replace" or "escape"
md = markdown.Markdown(safe_mode=mode)
html = md.convert(text)
It is very flexible, you can use various extensions, create your own etc.
OTHER TIPS
You could use restructured text . I'm not sure if it has a sanitizing option, but it's well supported by Python, and it generates all sorts of formats.
This simple sanitizing function uses a whitelist and is roughly the same as the solution of python-html-sanitizer-scrubber-filter, but also allows to limit the use of attributes (since you probably don't want someone to use, among others, the style
attribute):
from BeautifulSoup import BeautifulSoup
def sanitize_html(value):
valid_tags = 'p i b strong a pre br'.split()
valid_attrs = 'href src'.split()
soup = BeautifulSoup(value)
for tag in soup.findAll(True):
if tag.name not in valid_tags:
tag.hidden = True
tag.attrs = [(attr, val) for attr, val in tag.attrs if attr in valid_attrs]
return soup.renderContents().decode('utf8').replace('javascript:', '')