Question

I've been using an xmlDataReader to read RSS for many years, but all of a sudden two feeds I've use have introduced an extra line which is tripping up the xmlDataReader parser.

The problem is the second line here conflicts with the first:

<link>http://www.eventjobsearch.co.uk/jobsrss/</link>
<atom:link href="http://www.eventjobsearch.co.uk/jobsrss/" rel="self" type="application/rss+xml"/>

The parser thinks the atom:link element is a duplicate of the link element. I don't personally need the atom:link line but as I'm using a stream, I can't see any way to remove this line or remove the colon (which would solve the problem).

How can I get rid of the colon in the stream so the built in parser works again?

 HttpWebRequest req = (HttpWebRequest)WebRequest.Create(WebConfigurationManager.AppSettings["XmlJobsFeedUrl"]);
 req.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)"; 

 WebResponse response = req.GetResponse();
 Stream stream = response.GetResponseStream();
 XmlTextReader xmlTextReader = new XmlTextReader(stream);
 DataSet jobs = new DataSet("Jobs");
 jobs.ReadXml(xmlTextReader);
Was it helpful?

Solution

Please see this question and solution. Straight before calling jobs.ReadXml(...), you can read the schema:

jobs.ReadXmlSchema("http://www.thearchitect.co.uk/schemas/rss-2_0.xsd");

It's probably recommended to copy the xsd file to your own server.

OTHER TIPS

I'm thinking you could probably do this:

Stream stream = response.GetResponseStream();
Stream cleanStream = new Stream();

using (StreamWriter writer = new StreamWriter(cleanStream))
{
    using (StreamReader reader = new StreamReader(stream))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            // if the line doesn't match the problem then write it to the stream like this
            writer.WriteLine(line);
        }
    }
}

// and now use the cleanStream instead
XmlTextReader xmlTextReader = new XmlTextReader(cleanStream);

and that should allow you to clean the stream you've received.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top