Question

The code below takes all of the text from a certain div. Is it possible for me to take all the text from the div as well as the html attributes? So it also adds all of the <p> </p>'s and <br> </br>'s to the string, myString?

//trims string from previous page
        NSString *trimmedString = [stringy stringByTrimmingCharactersInSet:
                                 [NSCharacterSet whitespaceAndNewlineCharacterSet]]; 

    NSData *data = [[NSString stringWithContentsOfURL:[NSURL URLWithString:trimmedString]] dataUsingEncoding:NSUTF8StringEncoding];
    TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:data];    
    NSArray *elements  = [xpathParser searchWithXPathQuery:@"//div[@class='field-item even']"];
    TFHppleElement *element = [elements lastObject]; //may need to change this number?!
    NSString *mystring = [self getStringForTFHppleElement:element];

    trimmedTextView.text = [trimmedTextView.text stringByAppendingString:mystring];

Method here:

-(NSString*) getStringForTFHppleElement:(TFHppleElement *)element 
{

NSMutableString *result = [NSMutableString new];

// Iterate recursively through all children
for (TFHppleElement *child in [element children])
    [result appendString:[self getStringForTFHppleElement:child]];

// Hpple creates a <text> node when it parses texts
if ([element.tagName isEqualToString:@"text"])
    [result appendString:element.content];

return result;
}

Any ideas would be appreciated. Cheers.

Was it helpful?

Solution

Try this:

NSString *htmlDataString = [webView stringByEvaluatingJavaScriptFromString: @"document.documentElement.outerHTML"];

This will take all the HTML out to string. You can then parse it in your native code and find div which is your interest what you have did in above example.

You can do it as well with any DOM element in your HTML like:

NSString *htmlDataString = [webView stringByEvaluatingJavaScriptFromString: @"document.documentElement.getElemenById('mydiv')"];

which is more efficient but requires a bit of javascript skill.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top