search
Japanese Chinese Nederlands Espanol Italiano Deutsch Francais Twitter Rss Feeds
MicrosoftArticlesForumsFAQs
C# .NET
VB.NET
Visual Studio .NET
ADO.NET
Xml / Xslt
VB 6.0
.NET CF
GDI+
LINQ
Deployment
Security
FoxPro
Silverlight / WPF
Entity Framework
RIA Services

Web ProgrammingArticlesForumsFAQs
JavaScript
ASP
ASP.NET
Web Services

Non-MicrosoftArticlesForumsFAQs
NHibernate
Perl
PHP
Ruby
Java
Linux / Unix
Apple
Open Source

DatabasesArticlesForumsFAQs
SQL Server
Access
Oracle
MySQL
Other Databases

OfficeArticlesForumsFAQs
Excel
Word
Powerpoint
Outlook
Publisher
Money

Operating SystemsArticlesForumsFAQs
Windows 7
Windows Server
Windows Vista
Windows XP
Windows Update
MAC
Linux / UNIX

Server PlatformsArticlesForumsFAQs
BizTalk
Site Server
Exhange Server
IIS

Graphic DesignArticlesForumsFAQs
Macromedia Flash
Adobe PhotoShop
Expression Blend
Expression Design
Expression Web

OtherArticlesForumsFAQs
Subversion / CVS
Ask Dr. Dotnetsky
Active Directory
Networking
Uninstall Virus
Job Openings
Product Reviews
Search Engines
Resumes

 

"Everything" RSS / ATOM Feed Parser


By Peter Bromberg
Printer Friendly Version
View My Articles
27 Views
    

A canonicalized, generic solution to the problem of parsing any kind of RSS or ATOM feed and returning a usable collection for databinding.


While looking over the SyndicationFeed class and related classes I found out something quite annoying: Microsoft put together this wonderful class infrastructure for handling various kinds of Syndication Feeds, in .NET 3.5, but they cannot handle the old style ATOM .03 feed schema (xmlns="http://purl.org/atom/ns#").  If you attempt to use the SyndicationFeed.Load method, you get this:  "The element with name 'feed' and namespace 'http://purl.org/atom/ns#' is not an allowed feed format.". Frankly, I think the error message should have been written more like "We decided we didn't want to bother with ATOM .03 feeds, so tough titsky on you!".

That's too bad, because a huge number of feeds, including most of Google's news, gmail and other feeds, are still delivered in this format. I have no idea what the rationale for this omission was, nor do I care to speculate. The bottom line is, .NET 3.5 SyndicationFeed classes cannot handle the format.

So, what should a developer do? Well, you can either spend a lot of time figuring out how to override the existing infrastructure, or you can just roll your own. In my case since I was mostly interested in gathering and displaying feed items, all I needed was the <item> or <entry> collection from the respective feed. Since all feeds are well-formed XML, I decided to start from that common denominator.

The code I present here is relatively simple: I start out with a GenericFeedItem class as a container for the Title, Link, Description and PubDate items, and I use an XmlTextReader with a switch block to traverse the DOM of  the retrieved feed, testing for and adding the correct elements and canonicalizing their names. The result is a simple, fast way to parse any feed (adding additional switch tests as needed) and return a standardized List<GenericFeedItem> collection that is always the same and can be databound.

The XmlTextReader class is perfect for this scenario because it provides fast, non-cached, forward-only access to XML data - similar to the way a SQLDataReader handles data from a SQL Server query. The switch block can be easily modified to accomodate additional feed schemas.

Here is the ultra-simple GenericFeedItem class:

using System;

namespace PAB.FeedParser
{
    [Serializable]
    public class GenericFeedItem
    {
        public string Title { get; set; }
        public string Link { get; set; }
        public string Description { get; set; }
        public DateTime PubDate { get; set; }
    }
}
And here is the GenericFeedParser class, with plenty of inline comments to explain what is happening:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
using PAB.FeedParser;

namespace PAB.FeedParser
{
    public class GenericFeedParser
    {
        public List<GenericFeedItem> ReadFeedItems(string url)
        {
            //create a List of type Dictionary<string,string> for the element names and values
            var items = new List<Dictionary<string, string>>();
            // declare a Dictionary to capture each current Item in the while loop
            Dictionary<string, string> currentItem = null;
            // Wrap a new XmlTextReader around the url of the feed
            var reader = new XmlTextReader(url);
            /// Read each element with the reader
            while (reader.Read())
            {
                // if it's an element, we want to process it
                if (reader.NodeType == XmlNodeType.Element)
                {
                    string name = reader.Name;
                    if (name.ToLowerInvariant() == "item" || name.ToLowerInvariant() == "entry")
                    {
                        // Save previous item
                        if (currentItem != null)
                            items.Add(currentItem);

                        // Create new item
                        currentItem = new Dictionary<string, string>();
                    }
                    else if (currentItem != null)
                    {
                        reader.Read();
                        // some feeds can have duplicate keys, so we don't want to blow up here:
                        if (!currentItem.Keys.Contains(name))
                            currentItem.Add(name, reader.Value);
                    }
                }
            }

            // now create a List of type GenericFeedItem
            var itemList = new List<GenericFeedItem>();
            // iterate all our items from the reader
            foreach (var d in items)
            {
                var itm = new GenericFeedItem();
                //do a switch on the Key of the Dictionary <string, string> of each item
                foreach (string k in d.Keys)
                {
                    switch (k)
                    {
                        case "title":
                            itm.Title = d[k];
                            break;
                        case "link":
                            itm.Link = d[k];
                            break;
                        case "published":
                        case "pubDate":
                        case "issued":
                            DateTime dt ;
                           DateTime.TryParse(d[k],out dt);
                            itm.PubDate = dt != DateTime.MinValue  ? dt : DateTime.Now;
                            break;
                        case "content":
                        case "description":
                            itm.Description = d[k];
                            break;
                        default:
                            break;
                    }
                }
                // add the created item to our List
                itemList.Add(itm);
            }
            return itemList;
        }
    }
}


In order to use this arrangement (say in a web page with a GridView) one would use code similar to this:


protected void DropDownList1_SelectedIndexChanged(object sender, EventArgs e)
        {
            if( DropDownList1.SelectedValue=="") return;
            var parser = new GenericFeedParser();
            List<GenericFeedItem> items = parser.ReadFeedItems(DropDownList1.SelectedValue);
            GridView1.DataSource = items;
            GridView1.DataBind();

        }

That's all it takes! You can throw virtually any kind of feed at this and it will happily return a List of type GenericFeedItem for you. If I have missed any of the common feed schemas in this exercise, it is a simple matter to modify the switch block as shown above in order to accomodate them.

In the downloadable solution, I've enhanced the arrangement  to permit the addition of a SyndicationFormat class which identifies the feed type and provides the title of the feed as well. You can download the full Visual Studio 2008 Solution which includes a web project with a page that will try each feed type and display the results , including the feed type and it's title.


Biography - Peter Bromberg
Peter Bromberg is a C# MVP, MCP, and .NET expert who has worked in banking, financial and telephony for over 20 years. Pete focuses exclusively on the .NET Platform, and currently develops SOA and other .NET applications for a Fortune 500 clientele. Peter enjoys producing digital photo collage with Maya,playing jazz flute, the beach, and fine wines. You can view Peter's UnBlog and IttyUrl sites.
Please post questions at forums, not via email!

button
Article Discussion: "Everything" RSS / ATOM Feed Parser
Peter Bromberg posted at Tuesday, January 06, 2009 9:35 AM
Original Article
 

Atom 10
Voss Grose replied to Peter Bromberg at Wednesday, April 15, 2009 10:57 AM
Peter...great article.

I'm having issues with Atom10 feeds....doesn't seem to bring in the links.

Any hints on how best to handle this.

Thanks again for the parser...was really helpful.
 

Small bug-fix suggested
d k replied to Peter Bromberg at Sunday, May 17, 2009 8:34 AM

The code

                       // Save previous item
                        if (currentItem != null)
                            items.Add(currentItem);


must be repeated also after(!) the while (reader.Read()) loop.

Otherwise, the last item will not be included in the returned itemList