search
Japanese Chinese Nederlands Espanol Italiano Deutsch Francais Twitter Rss Feeds
MicrosoftArticlesForumsFAQs
C# .NET
VB.NET
Visual Studio .NET
ADO.NET
Xml / Xslt
VB 6.0
.NET CF
GDI+
LINQ
Deployment
Security
FoxPro
Silverlight / WPF
Entity Framework
RIA Services

Web ProgrammingArticlesForumsFAQs
JavaScript
ASP
ASP.NET
Web Services

Non-MicrosoftArticlesForumsFAQs
NHibernate
Perl
PHP
Ruby
Java
Linux / Unix
Apple
Open Source

DatabasesArticlesForumsFAQs
SQL Server
Access
Oracle
MySQL
Other Databases

OfficeArticlesForumsFAQs
Excel
Word
Powerpoint
Outlook
Publisher
Money

Operating SystemsArticlesForumsFAQs
Windows 7
Windows Server
Windows Vista
Windows XP
Windows Update
MAC
Linux / UNIX

Server PlatformsArticlesForumsFAQs
BizTalk
Site Server
Exhange Server
IIS

Graphic DesignArticlesForumsFAQs
Macromedia Flash
Adobe PhotoShop
Expression Blend
Expression Design
Expression Web

OtherArticlesForumsFAQs
Subversion / CVS
Ask Dr. Dotnetsky
Active Directory
Networking
Uninstall Virus
Job Openings
Product Reviews
Search Engines
Resumes

 

An Alexa Site Data Utility Class


By Peter Bromberg
Printer Friendly Version
View My Articles
6 Views
    

Peter puts together a neat class to capture Alexa Site data on any tracked domain - with no API and no developer license key.


Alexa has been around for quite some time, and provides a lot of information about sites that rank in it's "Top 100,000" list, as determined by the visiting habits of the thousands of geeks who happily wear the Alexa Toolbar on their browsers. Many SEO "experts" say that Alexa data is skewed, and they may very well be correct. However, the fact remains that within the universe of Alexa "contributors" the data is very consistent and still quite valuable. According to Alexa:

The traffic rank is based on three months of aggregated historical traffic data from millions of Alexa Toolbar users and is a combined measure of page views and users (reach). As a first step, Alexa computes the reach and number of page views for all sites on the Web on a daily basis.

The main Alexa traffic rank is based on the geometric mean of these two quantities averaged over time (so that the rank of a site reflects both the number of users who visit that site as well as the number of pages on the site viewed by those users)

You can get a good overview of Alexa methodology and terms at their FAQ page.

You can also affect your own Alexa rank by using Alexa's own redirect url scheme. For example the url below will redirect through Alexa and on to eggheadcafe.com:

http://redirect.alexa.com/redirect?www.eggheadcafe.com

While Alexa provides an extensive API including one to get site information, it requires a license key. The key doesn't cost anything, so if you prefer API's, help yourself to one. However, many developers are not aware that there is a url that Alexa uses to return site data without a developer key or API. The url scheme is very simple:

 http://alexa.com/xml/dad?url=microsoft.com

 Using this neat little "trick" enabled me to create a nifty class that will return all the goodies in a DataSet, containing three DataTables - one for the Site Owner data and basic Alexa ranking stats, one for Other Domains owned by the site owner, and one for related sites.

 Without further adieu, here is all the code for the utility class:

using System.Data;
using System.Xml;

namespace AlexaDataLib
{
    public static class AlexaData
    {
        public static DataSet GetSiteData(string domain)
        {
            // Create a DataSet, then DataTables for each group of data
            // then add the respective needed DataColumns to each
            DataSet ds = new DataSet();
            DataTable dtRelated = new DataTable();
            dtRelated.Columns.Add("HREF");
            dtRelated.Columns.Add("TITLE");

            DataTable dtSiteData = new DataTable();
            dtSiteData.TableName = "SiteData";
            dtSiteData.Columns.Add("TITLE");
            dtSiteData.Columns.Add("STREET");
            dtSiteData.Columns.Add("CITY");
            dtSiteData.Columns.Add("STATE");
            dtSiteData.Columns.Add("ZIP");
            dtSiteData.Columns.Add("COUNTRY");
            dtSiteData.Columns.Add("OWNER");
            dtSiteData.Columns.Add("PHONE");
            dtSiteData.Columns.Add("EMAIL");
            dtSiteData.Columns.Add("CREATED");
            dtSiteData.Columns.Add("LINKSIN");
            dtSiteData.Columns.Add("SPEED");
            dtSiteData.Columns.Add("POPULARITY");
            dtSiteData.Columns.Add("REACH");
            dtSiteData.Columns.Add("DESC");

            DataTable dtDomains = new DataTable();
            dtDomains.Columns.Add("DOMAIN");
            dtDomains.TableName = "Domains";
            // create the alexa request url
            string url = "http://alexa.com/xml/dad?url=" + domain;

            XmlDocument doc = new XmlDocument();
            // load the xml document from the url
            doc.Load(url);
            XmlNodeList relatedNods = doc.SelectNodes("//RL");
            DataRow relRow = null;
            string href = "";
            string titl = "";
            foreach (XmlNode nod in relatedNods)
            {
                relRow = dtRelated.NewRow();
                href = nod.Attributes["HREF"].InnerText;
                titl = nod.Attributes["TITLE"].InnerText;
                relRow.ItemArray = new object[] {href, titl};
                dtRelated.Rows.Add(relRow);
            }
            dtRelated.TableName = "RelatedSites";
            ds.Tables.Add(dtRelated);
            XmlNodeList domainsNods = doc.SelectNodes("//DO");
            DataRow doRow = null;
            string dom = "";

            foreach (XmlNode nod in domainsNods)
            {
                doRow = dtDomains.NewRow();
                dom = nod.Attributes["DOMAIN"].InnerText;
                doRow.ItemArray = new object[] {dom};
                dtDomains.Rows.Add(doRow);
            }

            ds.Tables.Add(dtDomains);
            string title = doc.SelectSingleNode("//SITE").Attributes[1].InnerText;
            string street = doc.SelectSingleNode("//ADDR").Attributes[0].InnerText;
            string city = doc.SelectSingleNode("//ADDR").Attributes[1].InnerText;
            string state = doc.SelectSingleNode("//ADDR").Attributes[2].InnerText;
            string zip = doc.SelectSingleNode("//ADDR").Attributes[3].InnerText;
            string country = doc.SelectSingleNode("//ADDR").Attributes[4].InnerText;
            string owner = doc.SelectSingleNode("//OWNER").Attributes[0].InnerText;
            string phone = doc.SelectSingleNode("//PHONE").Attributes[0].InnerText;
            string email = doc.SelectSingleNode("//EMAIL").Attributes[0].InnerText;
            string created = doc.SelectSingleNode("//CREATED").Attributes[0].InnerText;
            string linksin = doc.SelectSingleNode("//LINKSIN").Attributes[0].InnerText;
            string speed = doc.SelectSingleNode("//SPEED").Attributes[0].InnerText;
            string popularity = doc.SelectSingleNode("//POPULARITY").Attributes[1].InnerText;
            string reach = doc.SelectSingleNode("//REACH").Attributes[0].InnerText;
            string desc = doc.SelectSingleNode("//SITE").Attributes[2].InnerText;
            DataRow row = dtSiteData.NewRow();
            row.ItemArray =
                new object[]
                    {
                        title, street, city, state, zip, country, owner, phone, email, 
                        created, linksin, speed, popularity,reach, desc
                    };
            dtSiteData.Rows.Add(row);
            ds.Tables.Add(dtSiteData);
            return ds;
        }
    }
}
Most developers should be able to walk through the above code with no explanations, so I'll leave it to stand on it's own.  Astute readers may be wondering why I didn't just use the "ReadXml" method of the DataSet class. Believe me, that was the first thing I tried. Unfortunately, DataSet isn't that smart - and if it finds duplicate field names in different tables, it will choke. Alexa's XML Document has such issues.

For fun, take a look at some of the Alexa data that's been hacked. For example, look up Live.com. The owner is listed as "Hacker Rootkit.Com.cn", and the related sites have obviously been hacked by somebody from China that wanted a good laugh. There are all kinds of little tricks one learns in this area. For example, try these two google searches and see which one gives better results:   

[yoursite.com*] -site:yoursite.com     --or --    link:yoursite.com

The downloadable Visual Studio 2005 Solution  includes a nice test harness web page with a DetailsView and two GridViews to show off the data. If you are just curious and would like to try out a live version of this, I've got one up on my  "Playground Site" at IttyUrl.Net.

Enjoy.

Biography - Peter Bromberg
Peter Bromberg is a C# MVP, MCP, and .NET expert who has worked in banking, financial and telephony for over 20 years. Pete focuses exclusively on the .NET Platform, and currently develops SOA and other .NET applications for a Fortune 500 clientele. Peter enjoys producing digital photo collage with Maya,playing jazz flute, the beach, and fine wines. You can view Peter's UnBlog and IttyUrl sites.
Please post questions at forums, not via email!

button
Article Discussion: An Alexa Site Data Utility Class
Peter Bromberg posted at Saturday, October 20, 2007 4:32 PM
Original Article
 

need to ask something sir
carl john redor replied to Peter Bromberg at Sunday, January 11, 2009 2:49 PM
hi sir, im carl john redor, a student from Philippines. I have to work with my thesis using visual studio 2005.. i just want to ask how to code timing method or the "in/out timing" in loaning process of a school library? i really need help sir... i hope you can reply with my post, as soon as you read this... im a beginner sir.. thank you...