While looking over the SyndicationFeed class and related classes I found out something quite annoying: Microsoft put together this wonderful class infrastructure for handling various kinds of Syndication Feeds, in .NET 3.5, but they cannot handle the old style ATOM .03 feed schema (xmlns="http://purl.org/atom/ns#"). If you attempt to use the SyndicationFeed.Load method, you get this: "The element with name 'feed' and namespace 'http://purl.org/atom/ns#' is not an allowed feed format.". Frankly, I think the error message should have been written more like "We decided we didn't want to bother with ATOM .03 feeds, so tough titsky on you!".
That's too bad, because a huge number of feeds, including most of Google's news, gmail and other feeds, are still delivered in this format. I have no idea what the rationale for this omission was, nor do I care to speculate. The bottom line is, .NET 3.5 SyndicationFeed classes cannot handle the format.
So, what should a developer do? Well, you can either spend a lot of time figuring out how to override the existing infrastructure, or you can just roll your own. In my case since I was mostly interested in gathering and displaying feed items, all I needed was the <item> or <entry> collection from the respective feed. Since all feeds are well-formed XML, I decided to start from that common denominator.
The code I present here is relatively simple: I start out with a GenericFeedItem class as a container for the Title, Link, Description and PubDate items, and I use an XmlTextReader with a switch block to traverse the DOM of the retrieved feed, testing for and adding the correct elements and canonicalizing their names. The result is a simple, fast way to parse any feed (adding additional switch tests as needed) and return a standardized List<GenericFeedItem> collection that is always the same and can be databound.
The XmlTextReader class is perfect for this scenario because it provides fast, non-cached, forward-only access to XML data - similar to the way a SQLDataReader handles data from a SQL Server query. The switch block can be easily modified to accomodate additional feed schemas.
Here is the ultra-simple GenericFeedItem class:
using System;
namespace PAB.FeedParser
{
[Serializable]
public class GenericFeedItem
{
public string Title { get; set; }
public string Link { get; set; }
public string Description { get; set; }
public DateTime PubDate { get; set; }
}
}And here is the GenericFeedParser class, with plenty of inline comments to explain what is happening:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
using PAB.FeedParser;
namespace PAB.FeedParser
{
public class GenericFeedParser
{
public List<GenericFeedItem> ReadFeedItems(string url)
{
//create a List of type Dictionary<string,string> for the element names and values
var items = new List<Dictionary<string, string>>();
// declare a Dictionary to capture each current Item in the while loop
Dictionary<string, string> currentItem = null;
// Wrap a new XmlTextReader around the url of the feed
var reader = new XmlTextReader(url);
/// Read each element with the reader
while (reader.Read())
{
// if it's an element, we want to process it
if (reader.NodeType == XmlNodeType.Element)
{
string name = reader.Name;
if (name.ToLowerInvariant() == "item" || name.ToLowerInvariant() == "entry")
{
// Save previous item
if (currentItem != null)
items.Add(currentItem);
// Create new item
currentItem = new Dictionary<string, string>();
}
else if (currentItem != null)
{
reader.Read();
// some feeds can have duplicate keys, so we don't want to blow up here:
if (!currentItem.Keys.Contains(name))
currentItem.Add(name, reader.Value);
}
}
}
// now create a List of type GenericFeedItem
var itemList = new List<GenericFeedItem>();
// iterate all our items from the reader
foreach (var d in items)
{
var itm = new GenericFeedItem();
//do a switch on the Key of the Dictionary <string, string> of each item
foreach (string k in d.Keys)
{
switch (k)
{
case "title":
itm.Title = d[k];
break;
case "link":
itm.Link = d[k];
break;
case "published":
case "pubDate":
case "issued":
DateTime dt ;
DateTime.TryParse(d[k],out dt);
itm.PubDate = dt != DateTime.MinValue ? dt : DateTime.Now;
break;
case "content":
case "description":
itm.Description = d[k];
break;
default:
break;
}
}
// add the created item to our List
itemList.Add(itm);
}
return itemList;
}
}
}
In order to use this arrangement (say in a web page with a GridView) one would use code similar to this:
protected void DropDownList1_SelectedIndexChanged(object sender, EventArgs e)
{
if( DropDownList1.SelectedValue=="") return;
var parser = new GenericFeedParser();
List<GenericFeedItem> items = parser.ReadFeedItems(DropDownList1.SelectedValue);
GridView1.DataSource = items;
GridView1.DataBind();
}
That's all it takes! You can throw virtually any kind of feed at this and it will happily return a List of type GenericFeedItem for you. If I have missed any of the common feed schemas in this exercise, it is a simple matter to modify the switch block as shown above in order to accomodate them.
In the downloadable solution, I've enhanced the arrangement to permit the addition of a SyndicationFormat class which identifies the feed type and provides the title of the feed as well. You can download the full Visual Studio 2008 Solution which includes a web project with a page that will try each feed type and display the results , including the feed type and it's title. |