Come estrarre Atom / RSS
-
21-09-2019 - |
Domanda
Dato un URL, se ha tutti i nodi RSS, quindi sto aggiungendo al database.
per esempio:.
questo URL , rssDoc.SelectNodes("rss/channel/item").Count
è maggiore di zero.
l'url atomo , rssDoc.SelectNodes("rss/channel/item").count
è uguale a zero.
Come posso verificare se l'URL Atom / RSS ha nodi o no? Ho cercato per rssDoc.SelectNodes("feed/entry").Count
, ma mi sta dando conteggio zero.
Public Shared Function HasRssItems(ByVal url as string) As Boolean
Dim myRequest As WebRequest
Dim myResponse As WebResponse
Try
myRequest = System.Net.WebRequest.Create(url)
myRequest.Timeout = 5000
myResponse = myRequest.GetResponse()
Dim rssStream As Stream = myResponse.GetResponseStream()
Dim rssDoc As New XmlDocument()
rssDoc.Load(rssStream)
Return rssDoc.SelectNodes("rss/channel/item").Count > 0
Catch ex As Exception
Return False
Finally
myResponse.Close()
End Try
End Function
Soluzione
Il problema principale è che l'XML "percorso del nodo" su questa linea:
Return rssDoc.SelectNodes("rss/channel/item").Count > 0
è valido solo per feed RSS , non ATOM feed .
Un modo che ho su questo in passato è quello di utilizzare una semplice funzione per convertire un feed ATOM in un feed RSS. Naturalmente, si potrebbe andare nella direzione opposta, o no la conversione a tutti, però, la conversione in un unico formato consente di scrivere un pezzo "generico" di codice che tirerà fuori i vari elementi di elementi di un feed che si può essere interessati in (cioè data, titolo, ecc.)
C'è un ATOM a RSS Converter articolo sul codice del progetto che fornisce una tale conversione, però, che si trova in C #. Ho convertito in precedenza manualmente questo a VB.NET me stesso, quindi ecco la versione VB.NET:
Private Function AtomToRssConverter(ByVal atomDoc As XmlDocument) As XmlDocument
Dim xmlDoc As XmlDocument = atomDoc
Dim xmlNode As XmlNode = Nothing
Dim mgr As New XmlNamespaceManager(xmlDoc.NameTable)
mgr.AddNamespace("atom", "http://purl.org/atom/ns#")
Const rssVersion As String = "2.0"
Const rssLanguage As String = "en-US"
Dim rssGenerator As String = "RDFFeedConverter"
Dim memoryStream As New MemoryStream()
Dim xmlWriter As New XmlTextWriter(memoryStream, Nothing)
xmlWriter.Formatting = Formatting.Indented
Dim feedTitle As String = ""
Dim feedLink As String = ""
Dim rssDescription As String = ""
xmlNode = xmlDoc.SelectSingleNode("//atom:title", mgr)
If xmlNode Is Nothing Then
This looks like an ATOM v1.0 format, rather than ATOM v0.3.
mgr.RemoveNamespace("atom", "http://purl.org/atom/ns#")
mgr.AddNamespace("atom", "http://www.w3.org/2005/Atom")
End If
xmlNode = xmlDoc.SelectSingleNode("//atom:title", mgr)
If Not xmlNode Is Nothing Then
feedTitle = xmlNode.InnerText
End If
xmlNode = xmlDoc.SelectNodes("//atom:link/@href", mgr)(2)
If Not xmlNode Is Nothing Then
feedLink = xmlNode.InnerText
End If
xmlNode = xmlDoc.SelectSingleNode("//atom:tagline", mgr)
If Not xmlNode Is Nothing Then
rssDescription = xmlNode.InnerText
End If
xmlNode = xmlDoc.SelectSingleNode("//atom:subtitle", mgr)
If Not xmlNode Is Nothing Then
rssDescription = xmlNode.InnerText
End If
xmlWriter.WriteStartElement("rss")
xmlWriter.WriteAttributeString("version", rssVersion)
xmlWriter.WriteStartElement("channel")
xmlWriter.WriteElementString("title", feedTitle)
xmlWriter.WriteElementString("link", feedLink)
xmlWriter.WriteElementString("description", rssDescription)
xmlWriter.WriteElementString("language", rssLanguage)
xmlWriter.WriteElementString("generator", rssGenerator)
Dim items As XmlNodeList = xmlDoc.SelectNodes("//atom:entry", mgr)
If items Is Nothing Then
Throw New FormatException("Atom feed is not in expected format. ")
Else
Dim title As String = [String].Empty
Dim link As String = [String].Empty
Dim description As String = [String].Empty
Dim author As String = [String].Empty
Dim pubDate As String = [String].Empty
For i As Integer = 0 To items.Count - 1
Dim nodTitle As XmlNode = items(i)
xmlNode = nodTitle.SelectSingleNode("atom:title", mgr)
If Not xmlNode Is Nothing Then
title = xmlNode.InnerText
End If
Try
link = items(i).SelectSingleNode("atom:link[@rel= alternate ]", mgr).Attributes("href").InnerText
Catch ex As Exception
link = items(i).SelectSingleNode("atom:link", mgr).Attributes("href").InnerText
End Try
xmlNode = items(i).SelectSingleNode("atom:content", mgr)
If Not xmlNode Is Nothing Then
description = xmlNode.InnerText
End If
xmlNode = items(i).SelectSingleNode("//atom:name", mgr)
If Not xmlNode Is Nothing Then
author = xmlNode.InnerText
End If
xmlNode = items(i).SelectSingleNode("atom:issued", mgr)
If Not xmlNode Is Nothing Then
pubDate = xmlNode.InnerText
End If
xmlNode = items(i).SelectSingleNode("atom:updated", mgr)
If Not xmlNode Is Nothing Then
pubDate = xmlNode.InnerText
End If
xmlWriter.WriteStartElement("item")
xmlWriter.WriteElementString("title", title)
xmlWriter.WriteElementString("link", link)
If pubDate.Length < 1 Then
pubDate = Date.MinValue.ToString()
End If
xmlWriter.WriteElementString("pubDate", Convert.ToDateTime(pubDate).ToUniversalTime().ToString("ddd, dd MMM yyyy HH:mm:ss G\MT"))
xmlWriter.WriteElementString("author", author)
xmlWriter.WriteElementString("description", description)
xmlWriter.WriteEndElement()
Next
xmlWriter.WriteEndElement()
xmlWriter.Flush()
xmlWriter.Close()
End If
Dim retDoc As New XmlDocument()
Dim outStr As String = Encoding.UTF8.GetString(memoryStream.ToArray())
retDoc.LoadXml(outStr)
Return retDoc
End Function
L'uso è abbastanza semplice. Basta caricare nel tuo feed ATOM in un oggetto XmlDocument
e passarlo a questa funzione, e si otterrà un oggetto XmlDocument
indietro, in formato RSS!
Se siete interessati, ho messo un intero RSSReader classe su pastebin.com