Question

Problem:

My site allows users to copy/paste contents from other files/documents like MS Word and websites (eg CNN.com) into the Rich TextEditor we provide. This Rick TextEditor supports (and we too have to support) paste contents with embedded styles, this brings random styles, tags inline styles from content origin.

Eg: If you paste from any MS word document, it brings H1, H2, P, UL/OL/LI, STRONG, I, EM, TABLE etc. with their own styles. Same happens when you copy paste from other webpages.

How To Format? I am looking for THE best way to handle the formatting of these kinds of user-generated contents. First, I need to keep the copied tags intact. Lets say, H1 was brought from user from MS Word - I have to keep this yet style on my own using given corporate branding.

Another problem is, when you copy/paste from external origin - some tags are not properly closed - this causes my layout break. How do we handle this?

For styles, m applying

.article * {
   allKnownCSSProperties: myValues!important;
}

Any method would work. JavaScript, C# is preferred.

Was it helpful?

Solution

To strip out unwanted styles a simple regex would suffice. In Javascript:

/( style=['"][^'"]*['"])/g

OTHER TIPS

I'd try to solve problem with lack of closed tags as this: Parse whole message and collect tags that's not ends with /> and remove them if you're find same tag starts with </. Exclude tags that may not to have close tag, generate close tags for all tags that still in collection and place them at the end of yours Rich TextEditor layout. It may not work in some cases or looks clumsy but that first that comes in mind and it may help to solve the problem

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top