Question

I am trying to get the summary of an article and download it as a string. This works great with some articles, but the wikipedia website is inconsistent. So NSScanner fails pretty often while it works fine for other articles.

Here's my NSScanner implementation:

NSString *separatorString = @"<table id=\"toc\" class=\"toc\">";                                 
NSScanner *aScanner = nil;
NSString *container = nil;
NSString *muString = [NSString stringWithString:@"</table>"];

aScanner = [NSScanner scannerWithString:string];  
[aScanner setScanLocation:0];                                                   
[aScanner scanUpToString:muString intoString:nil];           
[aScanner scanString:muString intoString:nil];    

[aScanner scanUpToString:separatorString intoString:&container];

How could this be improved? Or is there another way of getting this?

To visualize which bit of the article I want, here's an example:

http://en.wikipedia.org/wiki/Indigo

from this I'd want everything from "Indigo is the color on the electromagnetic spectrum" to "in English was in 1289".

Thanks!

Was it helpful?

Solution

You could use WebKit's DOM API to walk the actual structure, rather than trying to parse the text blindly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top