Monthly Archives: August 2009

Why Use Table Data Gateways?

Two years ago (give or take) I was taking Dr. Chalin’s SOEN343: Software Design course. The course taught the basics of clean, reusable design and development of Web Enterprise Applications – or ‘WebEAs’ – using Martin Fowler’s "Patterns of Enterprise Application Architecture" as a textbook, and focused on the importance of using design patterns and layered architecture (presentation layer, application layer, domain logic layer, technical services layer).

A big part of what we were taught revolved around a cookie-cutter layered design, comprising of:

Not a bad design, and I learned a lot in that course, but I never liked using TDGs and Data Mappers; I can’t help but feel like calling an SQL stored procedure from a Data Mapper has just as much (if not more) benefit than going through a TDG.

The Arguments In Favor Of The TDG

The two arguments which were given in favor of using both a TDG and a Data Mapper were as follows:

  • Without a TDG, the Data Mapper is theoretically moved down into the technical services layer, creating upwards dependencies from Data Mapper objects to Domain Model objects.
  • Hiding SQL statements in the TDG reduces coupling so that your domain layer is unaffected by changes in the technical services layer.

My grievance lies in the fact that:

  1. The first reason is strictly theoretical.
  2. The second reason doesn’t hold up against scrutiny; following the TDG pattern, a Data Mapper is inevitably coupled to the data source in two ways:
    • It will have a dependency on database library components
    • It has to have some basic knowledge of the database structure

A Data Mapper inevitably has a dependency on database library components

A TDG usually has ‘finder’ methods which returns result set objects. If you’re using Java, that probably means an implementation of java.sql.ResultSet or – in PHP – a DB_result if you’re using the Pear::DB module. (I know it’s superceded)

If a TDG returns a result set of this sort, then a Data Mapper has to manipulate this result set in some way. In other words, the Data Mapper is client not only to the TDG, but to the result set object it returns; there is a dependency relationships between the Data Mapper and the result set, which is probably a library class like java.sql.ResultSet or DB_result.

To drive the point home, I offer two examples:

  • If we’re coding in Java, our TDG would have a finder method with a return type of java.sql.ResultSet. To manipulate the returned result set our Data Mapper would have to import java.sql.ResultSet which immediately creates a dependency between the Data Mapper ans the ‘java.sql’ package; if the data source is changed from a database to an XML file for example, we not only need a new technical services class to replace the TDG but we need to rewrite most of our Data Mapper.
  • If we now consider development in PHP, we’d write our TDG returns something like a DB_result object using the Pear::DB module. However, the Pear::DB module is superseded, so we eventually have to change the TDG so it uses PDO or MDB2, which changes the interface of the returned object, which means our Data Mapper has to be re-written.

… and thus, the Data Mapper which sits on top of a TDG inevitably has a dependency on database library components, in addition to the TDG.

A Data Mapper has to have some basic knowledge of the database structure

Again, I’m considering the finder methods of a Data Mapper, such as the following:

	public List<DomainModel> findAll() {
	
		ArrayList<DomainModel> found = new ArrayList<DomainModel>();
		ResultSet rs = TableDataGateway.findAll();
	
		while(rs.next()) { // iterate over all rows from the query
			DomainModel dm = new DomainModel();
			dm.setUuid(UUID.fromString(rs.getString("guid")));
			dm.setName(rs.getString("name"));
			found.add(dm);
		}
	
		return found;
	}

SQL aliases aside, whatever database query was made by TableDataGateway.findAll(), its structure has great impact on this code; we know it has to return columns "guid" and "name", at the very least.

What if we instead number the columns, instead of referring to them by name?

	public List<DomainModel> findAll() {
	
		ArrayList<DomainModel> found = new ArrayList<DomainModel>();
		ResultSet rs = TableDataGateway.findAll();
	
		while(rs.next()) { // iterate over all rows from the query
			DomainModel dm = new DomainModel();
			dm.setUuid(UUID.fromString(rs.getString(1)));
			dm.setName(rs.getString(2));
			found.add(dm);
		}
	
		return found;
	}

We are still coupled to the structure of the query – if we change the order of the columns this code has to change – and now the code is harder to understand. I’d expect that if the query were changed, this code would go unnoticed, and cause some bugs.

It’s a fair point that the use of SQL aliases could circumvent any impact changes in the query affecting the code in the Data Mapper… but doesn’t that harm traceability? I don’t think traceability should be sacrificed in favor of avoiding the need for maintenance on a single object.
Additionally, using SQL Aliases is one of the strengths of using stored procedures instead of TDGs.

Some Alternatives

The first instinct some might have is to make the TDG consume the result set and return a normal collection such as an array. This is, however, extremely inefficient; looping over the result set to first create this generic collection and then again to create domain model objects.

Another possibility is to use Fowler’s Record Set pattern: have a TDG return a Record Set which serves as an Adapter and provides indirection between the Data Mapper and the database library components. This is even consistent with the pattern description in Fowler’s book:

The UML diagram for a TDG from Patterns of EAA by Martin Fowler

And on that note, I suspect that Microsoft’s ADO.NET – or even LinQ – could probably address the points I’m raising. Fowler even explicitly states that in .NET is an environment which makes wide use of Record Set, and that it’s "a very efficient approach."

The last possibility I can think of is to consider using stored procedures in the place of TDGs:

  • Stored procedures offer better performance.
  • Stored procedures hide the underlying table structure from the client and can make use of SQL aliases as needed.
  • While a TDG can avoid application developers who don’t know SQL messing around with it,1 stored procedures avoids making database developer/administrator work with application code.

1
Martin Fowler’s web site states that the purpose of a TDG is to keep SQL code out of your domain logic layer, so that if you have developers who are unfamiliar with SQL they don’t have to worry about it, and so when the database structure changes, all the SQL is in one place for updating.

It should be noted that Fowler does discuss using Stored Procedures in the place of a TDG:

A common way of using stored procedures is to control access to a database, along the lines of Table Data Gateway. I don’t have any strong feelings about whether or not to do this, and from what I’ve seen there’s no strong reason either way. In any case I prefer to isolate the database access with the same patterns, whether database access is through stored procedures or more regular SQL.

A Confession

Personally, I do use TDGs with Data Mappers, but I use them as highly reusable software components; they take advantage of SQL-standard describe statements to automatically generate most of the SQL I need when developing. All that’s needed is that the name of a table is passed to the constructor of a TDG when it is instantiated, and the SQL statements effectively write themselves.

This works best in PHP with associative arrays; you can have a select() method which returns a result set object, then methods like insert and update accept associative arrays, mapping column names onto values.

And while I’m mentioning the method names of TDGs, I should note that I also like to use different function names from those Fowler suggests; to emphasize the distinction between the TDG and the Data Mapper, my Data Mappers tend to have methods create(…), find(…), update(…) and destroy(…), while my TDGs have methods insert(…), select(…), update(…) and delete(…).

In Java – since there are no associative arrays – it can result in some unneeded overhead and a loss of type safety, using HashMap or HashTable in the place of associative arrays – though I’ve never given it a full attempt. If I did I’d imagine it would make (excessive?) use of reflection, having the update and insert accept an argument of type java.lang.Object, reflectively mapping all its primitive type instance variables to table columns. I would leave the responsibility of persisting any non-primitive instance variables to the Data Mapper.