Comparing Datasets using Linq


By Yuri Kasan
Printer Friendly Version
View My Articles
25 Points
  

How to compare two datasets using Linq to DataSet



One of my recent projects included simple and I believe, commonly done tasks.  We were implementing project utilizing in Microsoft Visual Studio Professional edition, with C# 3.0 and the task was to compare two snapshots taken from the same table with particular time interval. The purpose of this task was to determine if any changes where done on the server side within given time interval.

The first way that came to my mind was to use the DataSet.Merge() method.  Ideally, this is how it should work:

At some point of the application initialization, we “grab” the first snapshot of data and load it into the original DataSet (let’s call it dsOriginal), then on the Timer event, when the given interval has passed, we take second snapshot (let’s call it dsChanged). The rest should be a fairly routine procedure:

1.       Create an empty DataSet (we will call it dsDifferences).
DataSet dsDifferences = new Dataset();

2.       Merge our dsOriginal with dsChanged
dsOriginal.Merge(dsChanged);

3.       The last step is to get the modified or added rows using the DataSet.GetChanges() method
dsDifferences = dsOriginal.GetChanges();

 

This is a very “stylish” and simple solution, but not so easy as in concept.
We were unpleasantly surprised when we found out that merge datasets do not change RowSatus property of the rows in the tables of the dataset.  Just to make sure that we did nothing wrong, I looked at MSDN library again and the MSDN materials confirmed that the Merge method combines rows (equivalent  to SQL Union statement) when no Primary Key present, or updates rows in target DataSet using source Dataset when Primary Key was defined. To resolve this issue we contacted the Microsoft Support team and after a few days of email exchange, we got the following message: I think the root of the issue is that Merge does not change row state.  If you create a dataset, then manually change a row value, then row state is tweaked and GetChanges() behaves as expected.”

But Merge does a “dumb” merge and does not set other flags from that we are seeing, but I am trying to find documentation to support that.  If that is the case, the only option is to do a manual compare.  I will keep you posted on what I find…

As you can see from the message that Microsoft suggestion was to go manually trough both DataSets comparing values row by row and manually update settings in the resulting Dataset using Row.SetAdded() or Row.SetModified()  methods according to the situation – a solution not as elegant. The fact that the method provided above is available for developers in Framework 2.0 and later for extra flexibility,  it is hard to imagine when these methods can be used in practical terms. After a day or two of additional research and conversations with Microsoft , we received another email from the support team, where we got the clue that the issue cannot be resolved: “I have raised the issue with appropriate personnel and I am waiting to hear back from them.  I will let you know something as soon as I find out more.

That’s when we turned our heads to LINQ.  I did not have extensive experience with this new feature, but in a few hours found it very easy to learn and apply. There are a few ways using LINQ to achieve the same result. For our purpose, we selected involvement of Union method. Therefore, here is the “moment of Zen”:

  1. During Initialization of our application, we still create an initial snapshot of the data
    in the sample below we will call it dsOriginal.
  2. On the Elapsed event of the timer we  create another snapshot of current data state
    we will call it dsChanged.
  3. To use LINQ we need to get the tables from both datasets into an enumerable row collection. This can be done using the AsEnumerable() method (see code sample below) and we are ready to use LINQ magic.

Below is the code, which completes the task:

 

var orig = dsOriginal.Tables[0].AsEnumerable();

var updated = dsChanged.Tables[0].AsEnumerable();

//First, getting new records if any           

var newRec =

            from u in dsChanged

            where !(from o in orig

                     select o.Field<decimal>("PRIMARY_KEY_FIELD"))

                 .Contains(u.Field<decimal>(" PRIMARY_KEY_FIELD"))

                        select new

                        {

                           prim_key = u.Field<decimal>("PRIMARY_KEY_FIELD"),

                           field1 = u.Field<decimal>("FIELD1"),

                           field2=u.Field<decimal>("FIELD2"),

                           field3 = u.Field<decimal>("FIELD3"),

                           field4 = u.Field<decimal>("FIELD4"),

                           rec_type="A"//Added

                        };

 

//Secondly, getting updated records

 var updRec = from u in updated

              join o in orig

              on u.Field<decimal>("PRIMARY_KEY_FIELD")

              equals o.Field<decimal>("PRIMARY_KEY_FIELD")
             where (u.Field<decimal>("FIELD1") !=
                  o.Field<decimal>("FIELD1")) ||
                  (u.Field<decimal>("FIELD2") !=
                  o.Field<decimal>("FIELD2"))

                         select new

                         {

                           prim_key = u.Field<decimal>("PRIMARY_KEY_FIELD"),

                           field1 = u.Field<decimal>("FIELD1"),

                           field2=u.Field<decimal>("FIELD2"),

                           field3 = u.Field<decimal>("FIELD3"),

                           field4 = u.Field<decimal>("FIELD4"),

                           rec_type = "M"//Mofified

                         };

var Union = newRec.Union(updRec);

 

 

The code snippet is simple and self-explanatory. In the first selected statement we are getting new records by using Contains() method  preceding  it with the “!” operator which gives us only new(Added) rows. In the second statement we get the rows with updated values using comparison in the Where clause, and finally, we compare the results using Union() method.  

 

You probably noticed the variable rec_type used in both queries. When you need to know which record was modified and which is new (added), the Union operator comes in handy because it allows you to create a custom flag on DataRow with this information, but if you only need to get differences without knowing if the row added or modified, you can get all the data in “one shot” and the code snippet below demonstrates how:

 

 

 

 

 

 

var AddedAndModif = from u in updated

                        where !(from o in orig

                                select o.Field<decimal>("PRIMARY_KEY"))

                               .Contains(u.Field<decimal>("PRIMARY_KEY"))

                        || !(from o in orig

                                select o.Field<decimal>("FIELD1"))

                               .Contains(u.Field<decimal>(“FIELD1"))

                        select new

                        {

                           prim_key = u.Field<decimal>("PRIMARY_KEY"),

                           field1 = u.Field<decimal>("FIELD1"),

                           field2=u.Field<decimal>("FIELD2"),

                           field3 = u.Field<decimal>("FIELD3"),

                           field4 = u.Field<decimal>("FIELD4"),

                        };

 

 

As you can see from this example, the Contains() method used twice solves the task and provides you with collection of both, modified and newly added rows.

 

 

In our opinion, among all other options, this solution appears to be fastest.  It is easy to read and modify as needed and showed about 20% better performance when comparing it to the same result achieved by the method suggested by MS Support professionals  provided at the top of this article (comparing row by row and manually changing RowSate using  Row.SetAdded() or Row.SetModified() methods).

 

Since the situation provided in this discussion is common, we decided to share our solution, thinking that it might make another developers life easier.

Aaron Katz - Certified PMP,
Yuri Kasan - MCAD



Biography
Yuri Kasan, is the lead developer of the corporate intranet at Mellon Investor Services, LLC. Yuri, who is an Microsoft Certified Developer specializes in the implementation of a wide variety of applications using Microsoft technologies. His expertise includes assisting his users in maximizing their investment in Microsoft desktop, server and development technologies. You can reach Yuri via email at ykasan@yahoo.com

button
 
Article Discussion: Comparing Datasets using Linq
Yuri Kasan posted at 13-Oct-08 08:22
Original Article

promotion
Silverlight    WPF    WCF    WWF    LINQ   
JavaScript    AJAX    ASP.NET    XAML   
C#    VB.NET    VB 6.0    GDI+    IIS    XML   
.NET Generics    Anonymous Methods    Delegate   
Visual Studio .NET    Expression Blend    Virus   
Windows Vista    Windows XP    Windows Update   
Windows 2003 Server    Windows 2008 Server   
SQL Server    Microsoft Excel    Microsoft Word   
SharePoint    BizTalk    Virtual Earth   
.NET Compact Framework    Web Service   

"Everything" RSS / ATOM Feed Parser
How to send and receive messages through message queuing in .Net
How to Read text file as database
SQL Server 2005 Paging Performance Tip
Display code of web page.
Fully Scalable Excel File Importer class for .net using Microsoft Jet driver
Generic Chart Color Manager class that can be used for any charts
Helper class to style the infragistics wingrid
Using Reflection to detemine as Assembly Info in and out.
Helper class to play with Window (Owners and position)
Resolving displayname from the culture using the XmlLanguage and LanguageSpecificStringDictionary class
Manipulate file attributes in VB.NET
Forms Based Authentication Filtered Content Editor for SharePoint
How to create a Tree View of the Windows Folder and extract all the file-folder info.
How to use AssemblyInfo.cs file in win forms to provide much needed information on Assemblies
Sorting In Datagrid
Helper class to work with NativeMethods in the native api's
Silverlight Line Of Business Applications With Offline WPF Versions
C# : Database monitoring system using XML file
C# : Adding ComboBox to ListView SubItem
Sum of Numbers Captcha: Keeping it Simple
C# Create a Piechart for the specified Hard Disk Drive Utilization
Extension Methods for DataSet and DataTable that makes tasks easier
Accessing IIS Hosted WCF Services from PHP
Helper class that provides most commonly used Extension Methods for DateTime object
Helper class to work with a Status Bar in WPF.
Finding Unmatched Records in Dataset Tables Using Linq
Silverlight Toolkit: Autocomplete TextBox Stock Symbols and Chart
COOL Auto Complete textbox using javascript
Creating a Serializable Log Entry for Microsoft Enterprise Library to log to a Database
ASP.NET Searching Values in Datagrid