By default, null values are put on top of the query resultset when field is ordered by in ascendant form.
This comes to be a problem in many scenarios, specially when we are ordering by a position field that can contain an integer value for its position on the dataset or null if position is not defined. Rows that have undefined position have lower weight than the specified ones thus coming first.
The following SQL query is from a very common scenario that represents a SELECT to fetch all city registries in “importance” (most common, not in fact important cities – don’t get mad if you live in an odd city) order.
SELECT id, city FROM cities ORDER BY POSITION;
This brings us all null-valued position rows first and not null positioned in ascending order, at the bottom.
This happens because our ordering pool will look like the following:
position (integer or null), city field value (string)
So null values are considered smaller than 1 (lowest positive integer) and then comes first in our resultset.
Workaround
Workaround is never a good choice, but I’m listing here so you DON’T, EVER do that.
- Using a huge integer value instead of null ones
- Setting the default field value to a big integer so it will come always on the bottom of the list. This is particularly bad where it compromises your database integrity since your data will have a fake value instead of a
nullvalue.nullmeans “not set”, “undefined”. Setting it otherwise will not mean the same. Database integrity is important so your data can be handled and transported by other apps. You may not bother with this now, but you may in the future.
Doing it maintaining data structure and integrity
We will take advantage of CASE native from most DBMS.
SELECT id, city FROM cities ORDER BY CASE WHEN POSITION IS NULL THEN 1 ELSE 0 END, POSITION, city
NOTE: Field city is added on ordering list in behalf of sorting in ascending order the rows with undefined position
CASE is a good SQL way to implement flow-control under database-level. In this case, if position field is null CASE returns 1, 0 otherwise.
This will make the order pool to be like:
(0/1), position (integer or null), city field value (string)
This way, other ordering criteria position and city will be treated as secondary and tertiary respectively, depending on the (0,1) value. 1 is bigger than 0 so if all null values are treated as 1, they will come after the ones that has a position integer representation that will come with 0 as first criteria.
Applying this concept on big tables
As CASE must do a calculation for every single row we may (and probably will) encounter speed issues when handling this ordering on big tables.
A solution is to mantain a boolean field like has_position and use it on the first criteria so our order pool gets something like:
has_position, position (integer or null), city field value (string)
The SQL code as it follows:
SELECT id, city FROM cities ORDER BY has_position, POSITION, city
NOTE: See which comes first (true/false) on boolean ordering on your DBMS and apply the ASC/DESC clause to has_position accordingly
The drawback is that you will have to run once in a while (or on position change) a SQL query to update the registries:
UPDATE cities SET has_position = 1 WHERE POSITION IS NOT NULL; UPDATE cities SET has_position = 0 WHERE POSITION IS NULL;
You can even use CASE for this, but will have the same speed issues.
UPDATE cities SET has_position = CASE WHEN POSITION IS NULL THEN 0 ELSE 1 END;
The difference is that you will have this issue just time to time instead of every query.
Benchmarking
We tested the three methods: only with position field (that doesn’t returns data in the way we want), with has_position, position (best form proposed for big tables) and with the CASE trick.
The results are as expected:

Exactly:
- position
- 25,77 secs
- has_position, position
- 12,83 secs
- CASE trick
- 23,20 secs
The tests were run on a HP Pavillion DV6780SE with a Core2 Duo 1.66ghz, 3GB ram with 2.11.3deb1ubuntu1.1 mysql running under Ubuntu Linux 8.04 on a table (without indexes) with 249.751 rows.
Download the test files
Test files include
- positioning.sql
- Database schema
- wordlist_pt_br.txt
- pt_BR Wordlist (use you preferred wordlist here)
- mysql_case_speed_test.php
- Structured programmed benchmark Script. Call with
?populateto populate database and generate random position numbers.
Download order-with-null-values.zip
Whats your nifty trick?
Can i suggest you a little performance test if this scenario using the PostgreSQL’s analyzer? They can provide the cost of the queries and give you some numbers to illustrate this approach.
Fernando F.
July 2, 2009 at 4:37 pm
Done as Fernando’s recommendation
Jan Seidl
July 2, 2009 at 11:17 pm