Friday, February 15, 2008

The Multiple-Record Set Problem

I've had several gotcha! moments in my career, and this week another one hit me.

Multiple-record sets. I'm not talking about MARS; rather queries which are expected to return a single record, but instead occasionally return two or more.

Now say your application has a subsystem for user access control, third party integration, or content management. Your user may be an individual, a group of individuals, or a client's SOAP architecture consuming a web service. Different users have access to different things in your application. These things might be technical analysis articles on NYSE traded stocks; stock transactions; items in a virtual shopping cart; billing invoices or other account activities for your services; or even comments on a blog.

When your controlling subsystem validates the user's right to a particular thing, it likely issues a SQL query against the application's database. It may look something like this:

select u.user_id, ur.user_rights
from users u
join user_rights ur on u.user_id = ur.user_id
where u.user_id = 3245674 and ur.thing_to_access = 'blog';



What your application expects is a single record, getting it in some, even most cases. It may look something like this:

user_id user_rights
------- ------------------------------------------------
3245674 articles=read;comments=read,write;post=none


However, what it actually gets, in a few cases, is this:

user_id user_rights
------- ------------------------------------------------
3245675 articles=read,write;comments=read;post=false
3245675 articles=read,edit;comments=approve;post=true


And worse, sometimes it receives this:

user_id user_rights
------- ------------------------------------------------
3245675 articles=read,edit;comments=approve;post=true
3245675 articles=read,write;comments=read;post=false


Most trying, as I've said, this doesn't even happen in most cases. This can be the cause for a bug with the classic traits - infrequent, inconsistent, and hard to track down. The bug can be on the data side requiring a DBA to resolve, on the application side requiring a developer to solve, or in the business layer requiring an entire committee, budget, and project plan to solve.

Now you may be scratching your head and wondering how this even leads to a bug. Let me illustrate!

Assume your application expects exactly one record defining user rights, looking up users by id. Now if the user's id is 3245674, there's only only one record to handle making the user rights certain. However, when looking up the user with id 3245675, two records are returned and the rights granted to the user are variable.

This can be bad. Sometimes very bad.

How so? Let's look at the cases:

1 - The code interpreting the recordset may assume a single record, interpreting the first in the set and ignoring the rest. In this case, the user's granted rights depend on the order of the records returned. Without an ORDER BY clause, the first record in subsequent executions of the same query can vary for a number of reasons - page splits, index reorganizations, and statistics recalculations to name a few. The user in this case would sometimes have rights to write articles but not post and at other times rights to edit and post but not write.

2 - It may loop through the records, discarding all but the last record. It should be obvious that this is the same problem as #1.

3 - It may even loop through the set and attempt to synthesize the results into a single set of permissions. In the above case, the user would be able to write and edit; read and approve comments; both of which may break business logic or internal controls. Then comes the ability to post - how do true and false combine? Either way, the end result is the user likely has the incorrect rights.

Now why does this happen, and what do you do about it? If you already have some strategies, add a comment for my other reader.

If not, check back with WillSQL4food soon!