Unit Testing Data from an RDBMS

1 PM November 14, 2003

Unless an SQL query specifies an ORDER BY clause, a database returns result rows in arbitrary order. If you don’t respect this in your unit testing, it causes a problem.

The problem arises because, for two identical queries over the same data, most databases will return result rows in the same order. Developers see the data come back in the same order twice and write unit tests that assume this order. Unfortunately, tests written like this will probably continue to work for quite some time.

Fast forward to the end of the project, when more is known about how the application uses the database, some poor bunny is giving some thought to the physical layout of the database – adding indexes, changing table storage parameters and the like. None of these physical layout changes affect the application, except perhaps to make it run faster.

But suddenly, all kinds of odd unit tests are breaking, even though the application itself still works fine.

The unit tests are breaking because the changes to the physical schema are affecting the order of the query result rows. Though it may take a few hour’s tedious, unnecessary, and high-pressure work to figure out what went wrong, and to fix the tests.

The best solution is to ensure that results returned from the database are always sorted; always use an ORDER BY clause in database queries. For unit testing, an arbitrary but stable order is preferable to a random ordering. If nothing else makes sense, try ordering by the primary key.

Sometimes it is not acceptable to order query results. In these cases the unit tests should either sort the resulting data in-memory, or scan through the result set to find the data it is looking for.

If you do these thing from the very start of your project, you will make the world a better place.

By alang | # | Comments (4)
(Posted to javablogs, Software Development, Java and Rants)

Comments

At 14:28, 14 Nov 2003 Ian Bicking wrote:

Or you can start using set operations to test your data results (still usually easier to add ordering, but sometimes the undefined ordering is part of the API).

(#)
At 17:06, 14 Nov 2003 Alan Green wrote:

That's easy for you to say, Ian, because you do unit testing in Python :)

Sadly, set operations are really quite incovenient in Java.

(#)
At 18:54, 14 Nov 2003 Rob wrote:

It isn't that hard in Java...

Set expectedResults = setupResults();
ResultSet[] r = runQuery();
Set actualResults = new HashSet(Arrays.asList(r));
assertEquals(actualResults, expectedResults);

(#)
At 19:22, 14 Nov 2003 Simon Brunning wrote:

According to the SQL standard, if you do a

SELECT * FROM MY_TABLE

the order of the *columns* is also undefined!

In practice, all the RDBMSs that I've worked with will always return the columns in the same order as they are defined in the schema, but it isn't guaranteed. Gadfy, for example, might return the columns in any order.

(#)

Add Comment




(Not displayed)






(Leave blank line between paragraphs. URLs converted to links. HTML stripped. Indented source code will be formatted with <pre> tags.)




© 2003-2006 Alan Green