Question

I am parsing a simple XML file, however sometimes there are tags that contain ampersands (&) in the node. I've done some research here and here but the problem is persisting. The problem is that the parser simply stops when it encounters the offending XML element. The XML looks like this:

<video>
  <video_id>42</video_id>
  <video_header>Six & Eight</video_header>
  <video_subheader>So Long</video_subheader>
</video>

The parser is updating an object, called DisStep, that has a parsedVideoArray attribute. The attribute is just an array of Parsed_Video objects. So the problem would be that when the the parser gets to foundCharacters for the element video_header it will not continue to didEndElement. In fact, an NSLog in the foundCharacters method of currentNodeContent is just "Six ".

And here is the code for the parser. All it does is look for videos and gather info about them.

-(void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName 
  namespaceURI:(NSString *)namespaceURI
  qualifiedName:(NSString *)qName
  attributes:(NSDictionary *)attributeDict
{
    if ([elementName isEqualToString:@"video"])
    {
        videoBeingParsed = [[Parsed_Video alloc] init];
    }
}

-(void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    string = [string stringByReplacingOccurrencesOfString:@"&" withString:@"&amp;"];
    currentNodeContent = (NSMutableString *) string;
}

- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName
  namespaceURI:(NSString *)namespaceURI
  qualifiedName:(NSString *)qName
{
    if ([elementName isEqualToString:@"video_id"])
    {
        videoBeingParsed.Video_ID = currentNodeContent;
        currentNodeContent = nil;
    }
    else if ([elementName isEqualToString:@"video_header"])
    {
        videoBeingParsed.Video_Header = currentNodeContent;
        currentNodeContent = nil;
    }

    else if ([elementName isEqualToString:@"video_subheader"])
    {
        videoBeingParsed.Video_SubHeader = currentNodeContent;
        currentNodeContent = nil;
    }
    else if ([elementName isEqualToString:@"video"])
    {
        [DisStep.parsedVideoArray addObject:videoBeingParsed];
        currentNodeContent = nil;
        videoBeingParsed = nil;
    }
}
@end

I tried the stringByReplacingOccurrencesOfString: withString: but the parser still stops working. Is there a way around this other than changing the XML?

Était-ce utile?

La solution

The issue is that you have not been given XML and the parser legitimately gets in a mess as it sees data that is not legal.. The XML specification says

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively.

Thus you have to alter the XML and replace & by &amp;

Autres conseils

XML parsers are REQUIRED to report a fatal error when you give them input that isn't well-formed XML.

Find out what program generated this corrupt data and fix it.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top