Saturday, October 20, 2012

Querying Multiple Databases in One LINQ Expression


Introduction

The objective of this article is to demonstrate functionality in the LinqToSql library that transforms LINQ expression trees to SQL statements that can be executed against multiple RDMS systems and not just Microsoft's SQL Server offerings. The LinqToSql library implements the following features and functionality:
  1. Queries multiple databases in one expression e.g. a Microsoft Access Database and an SQL Server Database
  2. Translates function calls and property accessors in the String and DateTime classes that have SQL equivalents e.g. firstName.LengthfirstName.ToUpper()orderDate.Year etc.
  3. Implements all IQueryable methods e.g. GroupByAnyAllSumAverage, etc.
  4. Correctly and comprehensively translates binary and unary expressions that have valid translations into SQL.
  5. Parameterizes queries instead of embedding constants in the SQL transformation.
  6. Performs caching of previously translated expression trees.
  7. Does not use MARS - Multiple Active Result Sets, an SQL Server 2005 specific feature.
  8. Correctly translates calls to SelectMany even when the query source involves method calls. The SQL Server 2005 specific keyword CROSS APPLY is neither required nor used.
So far, I have tested the functionality on SQL Server 2000 and 2005. Testing for compatibility with Microsoft Access and other RDMSs is underway as the project is still under active development.
The project file available above for download contains samples that run against the famous Northwind database on both SQL Server 2005 and Microsoft Access.
For implementation details, please see the following articles 12, 3.
In this article, I will focus on the Join functionality and the use of function calls and property accessors in theString and DateTime classes. But first...

Querying Multiple Databases in One LINQ Expression

Scenario

The Northwind corporation is facing financial meltdown as a result of its incompetent customer relations and inept order fulfilment. In order to save the situation, Northwind's management sent out a questionnaire asking its customers what their favourite movie is. The plan is to determine who the director of that movie is and to then send the customer a DVD collection of other films by that director. Unfortunately only one customer responds to the questionnaire.
Nevertheless the query must go on... against the following:
  1. Northwind DB on SQL Server
  2. FavouriteDVD.mdb, a Microsoft Access File
  3. An in-memory data structure with directors and a list of movies they've directed
We set up query sources as follows:
customers = new Query<customer />(new SqlQueryProvider(northwindConnection));
// northwindConnection is an SqlConnection to Northwind Db on SQL Server
.............

favouriteDVDs = new Query<favouritedvd />(new SqlQueryProvider(favouriteConnection));

// favouriteConnection is an OleDbConnection to FavouriteDVD.mdb
..................................................
var directorMovies = new Dictionary<string, />>();

directorMovies.Add("Stanley Kubrick", new List<string />() 
    { "Paths of Glory", "A Clockwork Orange" });
directorMovies.Add("Alfred Hitchcok", new List<string />() 
    { "39 Steps", "The Lady Vanishes" });
directorMovies.Add("John Frankenheimer", new List<string />() 
    { "Ronin" });
Now that we've set up our data sources, we can run the query:
var dvdInfo = (from customer in customers
               where customer.CustomerID == "ALFKI"
               select new {
                        customer.CustomerID,
                        customer.ContactName,
                        FavoriteDVD = (from favouriteDVD in favouriteDVDs
                                       where favouriteDVD.CustomerID == _
                            customer.CustomerID
                                       select new {
                                               Movie = favouriteDVD.DVDTitle,
                                               favouriteDVD.Director
                                      }).First()
                }
              ).ToList();
That gets us the customer information from Northwind (SQL Server) and the customer's favourite movie fromFavoriteDVD (Microsoft Access).
Now, we need a list of movies produced by the same director:
var dvdInfoPlus = from info in dvdInfo
                  select new {
                     info.CustomerID,
                     info.ContactName,
                     Movies = new {
                         info.FavoriteDVD,
                          MoviesBySameDirector = _
                           from movies in directorMovies[info.FavoriteDVD.Director]
                           select movies,
                                  }
                  };
The result is as follows:
CustomerID=ALFKI        ContactName=Maria Anders        Movies={ }
  Movies: FavoriteDVD={ }         MoviesBySameDirector=...
    FavoriteDVD: Movie=Apocalypse Now       Director=Stanley Kubrick
    MoviesBySameDirector: Paths of Glory
    MoviesBySameDirector: A Clockwork Orange
And there you have it. Download the sample code and try it out.

Aside

I did not purposely design LinqToSql to do this. I just stumbled on this functionality when I was running tests yesterday. I will explore this sort of thing in detail some other time. (Think querying and aggregating data from disparate databases, Web services, LDAP directories etc. using simple LINQ syntax. Interesting, eh?).

Fixing a Gaping Hole

If you downloaded the sample code last week and attempted to run a query like the one below, you would be out of luck:
var x = from c in customers
        join o in orders on c.CustomerID equals o.CustomerID
        let m = c.Phone
        orderby c.City
        where c.Country == "UK"
        where m != "555-555"
        select new { c.City, c.ContactName } into customerLite
        where customerLite.City == "London"
        select customerLite;

var y = x.ToList();
ObjectDumper.Write(y, 3);
A look at the expression generated will reveal why:
 .Join(, c => c.CustomerID, o => o.CustomerID, 
        (c, o) => new <>f__AnonymousType3`2(c = c, o = o))
 .Select(<>h__TransparentIdentifier46 => new <>f__AnonymousType14`2(
                           <>h__TransparentIdentifier46 = <>
                           h__TransparentIdentifier46, 
                          m = <>h__TransparentIdentifier46.c.Phone))
 .OrderBy(<>h__TransparentIdentifier47 => 
    <>h__TransparentIdentifier47.<>h__TransparentIdentifier46.c.City)
 .Where(<>h__TransparentIdentifier47 => 
    (<>h__TransparentIdentifier47.<>
        h__TransparentIdentifier46.c.Country = "UK"))
 .Where(<>h__TransparentIdentifier47 => 
     (<>h__TransparentIdentifier47.m != "555-555"))
 .Select(<>h__TransparentIdentifier47 => new <>f__AnonymousTypea`2(
   City = <>h__TransparentIdentifier47.<>
           h__TransparentIdentifier46.c.City, 
   ContactName = <>h__TransparentIdentifier47.<>
           h__TransparentIdentifier46.c.ContactName))
 .Where(customerLite => (customerLite.City = "London"))
This reveals at least the following issues:
  1. Very involved aliasing e.g.
    <>h__TransparentIdentifier47.<><br />h__TransparentIdentifier46.c.ContactName 
    which is aliasing a reference to c.ContactName
  2. Sequential calls to methods defined in the Queryable class e.g. .Where(...).Where(...)
In between bouts of heavy drinking over the weekend ;) I managed to resolve these issues. The specifics of how this was done will be addressed another day.
In any event, this expression and its ilk can now be parsed successfully. In this case, you'll get anSQLStatement that looks like so:
SELECT  City, ContactName
FROM 
Customers AS t1
 INNER JOIN 
Orders AS t2
 ON t1.CustomerID = t2.CustomerID
WHERE (t1.Country = @p2) AND (t1.Phone <> @p1) AND (t1.City = @p0)
ORDER BY t1.City 
This will produce the following result:
City=London     ContactName=Thomas Hardy
.............................................
City=London     ContactName=Victoria Ashworth
.............................................
City=London     ContactName=Elizabeth Brown
.............................................
City=London     ContactName=Ann Devon
.............................................
City=London     ContactName=Simon Crowther
.............................................
City=London     ContactName=Hari Kumar
.............................................

Samples and Translations

Example 1

var cutoffDate = new DateTime(1998, 1, 1);

var x = from c in customers
        join o in orders on c.CustomerID equals o.CustomerID
        where c.CustomerID.StartsWith("A") && o.OrderDate > cutoffDate
        orderby c.ContactName, o.OrderDate descending
        select new { Name = c.ContactName, o.OrderDate };

var y = x.ToList();
ObjectDumper.Write(y, 3);
The above query will produce a list of customers whose customerIDs begin with the letter "A" who placed orders after 1/1/1998. The following SQL query will be generated to produce the result:
SELECT  t1.ContactName, t2.OrderDate
FROM 
Customers AS t1
 INNER JOIN 
Orders AS t2
 ON t1.CustomerID = t2.CustomerID
WHERE (t1.CustomerID Like (@p1 + '%') AND (t2.OrderDate > @p0))
ORDER BY t1.ContactName , t2.OrderDate Desc
The results will look like so:
Name=Ana Trujillo       OrderDate=3/4/1998
Name=Antonio Moreno     OrderDate=1/28/1998
Name=Maria Anders       OrderDate=4/9/1998
Name=Maria Anders       OrderDate=3/16/1998
Name=Maria Anders       OrderDate=1/15/1998
Name=Thomas Hardy       OrderDate=4/10/1998
-----------------------------------------------------------------------------------

Example 2

var x = from c in customers
        orderby c.City
        where c.City.Contains("ri") || c.Country.EndsWith("o")
        select new { c.Country, c.City, c.ContactName };

var y = x.ToList();
ObjectDumper.Write(y, 3);
The above query will produce a list of customers who live in cities whose names contain the string "ri" or those who reside in countries where the country name ends with the letter "o".
SELECT  t0.Country, t0.City, t0.ContactName
FROM Customers AS t0
WHERE (t0.City Like ('%' + @p1 + '%') OR t0.Country Like ('%' + @p0))
ORDER BY t0.City 
The results will look like so:
Country=Venezuela       City=I. de Margarita    ContactName=Felipe Izquierdo
Country=Spain   City=Madrid     ContactName=Alejandra Camino
Country=Spain   City=Madrid     ContactName=Martín Sommer
Country=Spain   City=Madrid     ContactName=Diego Roel
Country=Mexico  City=México D.F.        ContactName=Francisco Chang
Country=Mexico  City=México D.F.        ContactName=Ana Trujillo

Example 3

var x = from order in orders
        where order.OrderDate.Value.Year > DateTime.Parse("1/1/1997").Year &&
              order.CustomerID.StartsWith("B")
        select new { order.CustomerID, order.OrderID, order.OrderDate };

ObjectDumper.Write(x);
This query will produce a list of customers who placed orders after 1997 and whose customerIDs start with the letter "B". The following SQL query will be generated to produce the result:
SELECT  t0.CustomerID, t0.OrderID, t0.OrderDate
FROM Orders AS t0
WHERE ((DATEPART(year, t0.OrderDate) > @p1) AND t0.CustomerID Like (@p0 + '%'))
The results will look like so:
CustomerID=BOTTM        OrderID=11048   OrderDate=4/24/1998
CustomerID=BLAUS        OrderID=11058   OrderDate=4/29/1998
CustomerID=BONAP        OrderID=11076   OrderDate=5/6/1998

Example 4

var x = customers
                .SelectMany(
                    c => orders.Where(o => c.CustomerID == o.CustomerID && 
                                           c.CustomerID.StartsWith("C")),
                    (c, o) => new { c.ContactName, o.OrderDate }
                )
                .OrderByDescending(d => d.OrderDate)
                .Select(e => new {e.ContactName, e.OrderDate})
                .Where(f => f.OrderDate > DateTime.Parse("1/1/1996"));

ObjectDumper.Write(x);
This query will produce a list of customers whose customerIDs start with the letter "C" who placed orders after 1/1/1996. The following SQL query will be generated to produce the result:
SELECT  ContactName, OrderDate
FROM 
Customers as t1
 , as t2
Orders
WHERE ((t1.CustomerID = Orders.CustomerID) AND _
        t2.CustomerID Like (@p1 + '%')) AND (t2.OrderDate > @p0)
ORDER BY OrderDate Desc
The results will look like so:
ContactName=Patricio Simpson    OrderDate=4/28/1998
ContactName=Yang Wang   OrderDate=4/22/1998
ContactName=Pedro Afonso        OrderDate=4/22/1998
ContactName=Yang Wang   OrderDate=4/16/1998
ContactName=Pedro Afonso        OrderDate=3/23/1998
ContactName=Yang Wang   OrderDate=3/20/1998
ContactName=Patricio Simpson    OrderDate=3/10/1998 
.................................................................................
That's it for now. In the next article, I will investigate how the performance of LinqToSql squares up against DLINQ.

Notes

  1. All 40 or so samples in the download will run successfully on Microsoft Access and SQL Server, except for those that have multiple parameters - these will not return results when run against Microsoft Access. I am investigating the cause of this behaviour.
  2. As a quick fix to get the samples that call SelectMany working on both Microsoft Access and SQL Server, the CrossJoinHandler will not emit the CROSS JOIN keyword, instead tables are separated by a comma. This is not optimal behaviour and the more elaborate fix alluded to in the previous article will be applied later.
  3. MARS is not used but multiple simultaneous connections may be opened during query evaluation. I will detail when and why this happens, performance implications and possible workarounds in another article.
  4. Most functionality has been covered, but there are corner cases that haven't been cornered as yet.
  5. A comprehensive code review is underway and a number of bugs have been fixed.
  6. Comments, suggestions and bug reports would be most appreciated.

1 comment:

  1. Rituranjan Gupta

    I haven't used it, but Kellerman Software has a LINQ provider for Firebird
    https://www.kellermansoftware.com/p-47-net-data-access-layer.aspx

    ReplyDelete