NSXMLParser chokes on ampersand &

https://stackoverflow.com/questions/1719119

19-09-2019
|

Question

I'm parsing some HTML with NSXMLParser and it hits a parser error anytime it encounters an ampersand. I could filter out ampersands before I parse it, but I'd rather parse everything that's there.

It's giving me error 68, NSXMLParserNAMERequiredError: Name is required.

My best guess is that it's a character set issue. I'm a little fuzzy on the world of character sets, so I'm thinking my ignorance is biting me in the ass. The source HTML uses charset iso-8859-1, so I'm using this code to initialize the Parser:

NSString *dataString = [[[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding] autorelease];
NSData *dataEncoded = [[dataString dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES] autorelease];
NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];

Any ideas?

Solution

To the other posters: of course the XML is invalid... it's HTML!

You probably shouldn't be trying to use NSXMLParser for HTML, but rather libxml2

For a closer look at why, check out this article.

OTHER TIPS

Are you sure you have valid XML? You are required to have special characters like & escaped, in the raw XML file you should see &

Encoding the Data through a NSString worked for me, anyway you are autoreleasing an object that was not allocated by yourself (dataUsingEncoding), so it crashes, the solution is :

NSString *dataString = [[NSString alloc] initWithData:data
                             encoding:NSISOLatin1StringEncoding];

NSData *dataEncoded = [dataString dataUsingEncoding:NSUTF8StringEncoding 
                                     allowLossyConversion:YES];

[dataString release];

NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow