I read a rant article the other day (I will update the link here as soon as I find it again) where the author was absolutely shooting down the use of the IN statement in queries, and proposing you rather use a JOIN and then compare fields in the WHERE clause.
Something bothered me about this statement, but I didn’t have time to check up on it then. Today I had a query from our finance department for a set of results on a certain GL account across all our cost centres. Quite a simple query with 2 tables involved, so I thought I would test the performance on both approaches.
Keeping in mind that we have a fairly large set of data, with just over 160K records in the Chart of Accounts, and just short of 4 million records in the transaction table. Both of these optimally indexed.
Before I state the results I got, let’s just look at the different operations here.
JOIN
- Take every record in the transaction table, and link the appropriate record from the Chart of Accounts to it.
- Compare 2 (varchar) fields in every record to the specified criteria
IN
- Get a subset of records (less than 200) from the Chart of Accounts
- Check if the (int) account id of every transaction is in the subset
Update: I’ve had another instance of this, doing a query across servers, counting records, and I cancelled the query on the join after 90 minutes with no result, the IN took 13 minutes.