Question

I have an iOS app that is pulling data from a Restful web service. A portion of the content I am receiving is being loaded into a UITextView. The portion that will be going into the text view is coming in as HTML format. I need to convert it from HTML to plain text while using the paragraph tags to format the text view properly.

Here is what the HTML format looks like

<p data-seq="1"><span class="paragraph">Content of paragraph 1</span></p><p data-seq="2"><span class="paragraph">Content of paragraph 2</span></p>

You can see that <p data-seq="2"><span class="paragraph">....</span></p> designates the start and end of the paragraph.

I initially tried using NSScanner from this example, How to convert NSString HTML markup to plain text NSString?. This was quick to implement but it strips all tags and and parses the text as one long paragraph.

I have added libXml2 to my code. I started following this tutorial for implementation but after I started working through it I wasn't sure how to format the output into paragraphs.

I have also seen recommendations for the DTCoreText library but I didn't see a lot of info on it.

Could someone possibly throw up a snippet using any of the above three options or one of their own on how to parse html into plain text while maintaining the paragraphs?

SOLUTION

Per lxt's recommendation I investigated DTCoreText. Once I managed to get it installed in my app (definitely recommend cocoa pods for that). It was easy as #import "DTCoreText.h" in my detailViewController and then the lines below to add it to the UITextView.

    NSDictionary *options = @{DTUseiOS6Attributes: [NSNumber numberWithBool:YES]};
    NSData *htmlData = [self.htmlString dataUsingEncoding:NSUTF8StringEncoding];
    NSAttributedString *stringArticle = [[NSAttributedString alloc] initWithHTMLData:htmlData options:options documentAttributes:NULL];
    self.newsDetailText.attributedText = stringArticle;

The first build failed because I didn't include the DTUseiOS6Attributes line. The second build succeeded and the detail view was perfectly formatted. It was a fist pump moment! Thanks again for the recommendation lxt!

Was it helpful?

Solution

I would honestly recommend using DTCoreText rather than writing your own parser. There's no real benefit reinventing the wheel, and it's also a widely used library with a large user base.

I am surprised you had trouble finding info about it, the library has very good documentation available, and the author is also pretty active on Twitter (@cocoanetics).

You can use the nifty DTAttributedTextView class provided in place of your UITextView. The library also provides a category that extends NSAttributedString with a initWithHTMLData:documentAttributes: method. This will let you create your attributed string and plug it into your view. It's really no more than a couple of lines of code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top