Question

The StackExchange sites uses MarkDown syntax for writing questions and answers. This is built using PageDown on the client side and MarkDownSharp and Jeff's HTML sanitizer on the server side. I'm thinking of building something very similar myself.

I understand why I need to sanitize the HTML on the server side. But what is the purpose of MarkDownSharp? Why wouldn't I just do the translation from MarkDown to HTML with PageDown on the client side and send that to the server?

Was it helpful?

Solution

The most important reasons:

  1. We want the most basic (in the sense of "fundamental") functionality of the Stack Exchange sites, asking and answering, to work without JavaScript. Most advanced stuff like voting, flagging, UI niceties, help texts, favoriting, global inbox, and whatnot require JavaScript, and that's okay. But the one thing that the sites cannot live without – Q&A – should not have that requirement, to keep friction as low as possible.

    Of course, when you don't have JavaScript as a requirement, you can't require that the client render the Markdown.

  2. We only store the rendered HTML version of the most recent version of a post. For past revisions, we only store the Markdown source. Storing both versions for every edit that gets made would be a huge waste of space, since the old versions are hardly ever needed. But sometimes they are needed, e.g. in the revision history. So for that, we have to render on the server side anyway.

  3. Even if you re-sanitize on the server-side, allowing the client to do the rendering obviously removes the trust that you can have in the rendered version really being made from the Markdown. Imagine the following:

    I, an evil spammer, post the following answer:

    As you can sea on [this awesome site][1],
    
    ... (long text on thread-safe usage of the turtle in LOGO) ...
    
    Hope that helps!
    
     [1]: http://almost-real-rolex-watches.biz
    

    But I submit a rendered version in which the link actually goes to a relevant site on the intricacies of turtle concurrency. Since the server expects both the Markdown source and the rendered HTML from me, it trusts that the one was made from the other.

    Along comes Sean Sceat, renowned Stack Overflow user with 120k reputation in the logo tag alone. He sees that the link indeed goes to a relevant page, likes the answer, upvotes it, posts a "Great answer; the site you link to has tons of helpful content!" comment, and while he's at it, he fixes the typo "sea" -> "see" (which the spammer made deliberately).

    But the Markdown that was in the editor after he clicked "edit" did not contain the relevant link anymore; it contained the Rolex link. And thus – unbeknownst to Sean – he not only fixed the typo, but also changed the link to go to the spammer's site.

    Now you have a post with the last edit coming from a trusted user, endorsing the answer, but with a link that we'd rather not have people clicking.

    And the revision history (see point 2.), would not even show that the link was changed.

It should be noted that the original version of the WMD JavaScript editor actually had the functionality you describe; you could set it up to submit the rendered HTML to the server. We finally removed that functionality when we published our refactored version under the name "PageDown", since we had never used it, maintained it, and I honestly don't know if it actually worked any more.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top