Question

I need to convert HTML data that consists of <h2>..</h2>, <p>..</p> and <a href=".."><img ..></a> elements into the attributedString with a proper formatting. I want to assign <h2> to UIFontTextStyleHeadline1 and <p> to UIFontTextStyleBody and store image links. I need the output to be attributedString with heading and body elements only and I will handle the images separately.

So far, I have this code:

NSMutableAttributedString *content = [[NSMutableAttributedString alloc] 
         initWithData:[[post objectForKey:@"content"] 
    dataUsingEncoding:NSUTF8StringEncoding] 
              options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                   NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]}
   documentAttributes:nil error:nil];

which outputs to something like this:

Heading
{
    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSFont = "<UICTFont: 0xd47bc00> font-family: \"TimesNewRomanPS-BoldMT\"; font-weight: bold; font-style: normal; font-size: 18.00pt";
    NSKern = 0;
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 14.94, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 2";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSStrokeWidth = 0;
}{
    NSAttachment = "<NSTextAttachment: 0xd486590>";
    NSColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
    NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
    NSKern = 0;
    NSLink = "http://www.placeholder.com/image.jpg";
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
    NSStrokeWidth = 0;
}
Body text, body text, body text. Body text, body text, body text.
{
    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
    NSKern = 0;
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSStrokeWidth = 0;
}

I am new to attributedString and seek for an efficient way to convert these attributes into the standard fonts mentioned above. Thank you.

Was it helpful?

Solution

If somebody would seek something similar I am on the end using TFHpple librabry to separate images from text elements in HTML data and then I change format attributes of the attributedString like this:

NSString *contentString = [self parseHTMLdata:bodyString];

NSMutableAttributedString *content = [[NSMutableAttributedString alloc] initWithData:[contentString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];

// prepare new format
NSRange effectiveRange = NSMakeRange(0, 0);

NSDictionary *attributes;

while (NSMaxRange(effectiveRange) < [content length]) {

attributes = [content attributesAtIndex:NSMaxRange(effectiveRange) effectiveRange:&effectiveRange];

    UIFont *font = [attributes objectForKey:@"NSFont"];

    if (font.pointSize == 18.0f) {

        [content addAttribute:NSFontAttributeName value:self.headlineFont range:effectiveRange];

    } else {

        [content addAttribute:NSFontAttributeName value:self.bodyFont range:effectiveRange];
    }
}

And the hpple part:

- (NSString *)parseHTMLdata:(NSString *)content
{
    NSData *data = [content dataUsingEncoding:NSUTF8StringEncoding];

    TFHpple *parser = [[TFHpple alloc] initWithHTMLData:data];

    NSString *xpathQueryString = @"//body";

    NSArray *elements = [[[parser searchWithXPathQuery:xpathQueryString] firstObject] children];

    NSMutableString *textContent = [[NSMutableString alloc] init];

    for (TFHppleElement *element in elements) {

        if ([[element tagName] isEqualToString:@"h2"] || [[element tagName] isEqualToString:@"p"]) {

            if ([[[element firstChild] tagName] isEqualToString:@"a"]) {

                // image element, just save it in array
            } else {

                // pure h2 or p element
                [textContent appendString:[element raw]];
            }
        }
    }

    return textContent;
}

Checking the font size in attributes may seem fragile, if it would cause some problems I can dig deeper into paragraph style which holds the heading/body tags.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top