6.10 Avoid UNIONs 127
HAVING MAX(payment_status) = 'F'
AND MIN(payment_status) = 'F';
The self-reference and correlation are complicated for both humans
and machines. Most optimizers are not smart enough to flatten the first
query like this.
Exceptions:
If you have a problem that is easier to understand with correlated
subqueries and your optimizer is good, then don’t be so afraid of them.
6.10 Avoid UNIONs
Rationale:
UNIONs are usually not well optimized. Because they require that
redundant duplicates be discarded, they force most SQL engines to do a
sort before presenting the result set to the user. If possible, use UNION
ALL instead. You should never have to build a chain of UNIONs from
the same base table. That code can be written with OR-ed predicates or
CASE expressions.
As an example of a horrible misuse of SQL, Chris White posted a
procedure that built dynamic SQL that would then build a report. Aside
from the obvious violations of basic software engineering, the output was
so huge that it exceeded the text size limits of SQL Server. He was
attempting to construct an entire report in the database by using
UNIONs to get the 12 lines of the report in the right order, by assigning
them a letter of the alphabet. The whole thing would take several pages
to show, but it is an extraction of the printout lines that were constructed
from just the General Ledger. I have not attempted to clean up much of
the code, so there are many violations of good coding rules in this
snippet.
. . .
UNION
SELECT DISTINCT 'J' AS section,
'NUMBER CHECKS' AS description, '' AS branch,
COUNT(DISTINCT GL.source) AS total1, 0 AS total2
FROM GeneralLedger AS GL
WHERE GL.period >= :start_period
AND GL.period <= :end_period
AND GL.year_for_period = :period_yr
128 CHAPTER 6: CODING CHOICES
AND GL.account_number IN ('3020')
AND GL.journal_id IN ('CD')
UNION
SELECT DISTINCT 'C' AS section,
'CASH RECEIPTS' AS description, '' AS branch,
SUM(GL.amount) * -1 AS total1, 0 AS total2
FROM GeneralLedger AS GL
WHERE GL.period >= :start_period
AND GL.period <= :end_period
AND GL.year_for_period = :period_yr
AND GL.account_number = '1050'
AND GL.journal_id IN ('CR')
UNION
SELECT DISTINCT 'D' AS section,
'NUMBER INVOICES' AS description, '' AS branch,
COUNT(DISTINCT GL.source) AS total1, 0 AS total2
FROM GeneralLedger AS GL
WHERE GL.period >= :start_period
AND GL.period <= :end_period
AND GL.year_for_period = :period_yr
AND GL.account_number IN ('6010', '6090')
AND GL.journal_id IN ('SJ')
UNION
SELECT DISTINCT 'E' AS section,
'VOUCHER TOTAL' AS description, '' AS branch,
SUM(GL.amount) * -1 AS total1, 0 AS total2
FROM GeneralLedger AS GL
WHERE GL.period >= :start_period
AND GL.period <= :end_period
AND GL.year_for_period = :period_yr
AND GL.account_number = '3020'
AND GL.journal_id IN ('PJ', 'TJ')
UNION
SELECT DISTINCT 'F' AS section,
'CHECKS PRINTED' AS description, '' AS branch,
SUM(GL.amount) AS total1, 0 AS total2
FROM GeneralLedger AS GL
WHERE GL.period >= :start_period
AND GL.period <= :end_period
AND GL.year_for_period = :period_yr
AND GL.account_number IN ('3020')
6.10 Avoid UNIONs 129
AND GL.journal_id IN ('CD')
UNION
SELECT DISTINCT 'K' AS section,
'NUMBER VOUCHERS' AS description, '' AS branch,
COUNT(DISTINCT GL.source) AS total1, 0 AS total2
FROM GeneralLedger AS GL
WHERE GL.period >= :start_period
AND GL.period <= :end_period
AND GL.year_for_period = :period_yr
AND GL.account_number IN ('3020')
AND GL.journal_id IN ('PJ', 'TJ');
The last part of the code could be reduced to a single, cohesive
procedure. The output of the procedure would then be formatted in the
front. Notice that section, description, and branch are all placeholders to
give a slot for columns in the other UNIONs not shown here.
CREATE PROCEDURE GeneralLedgeSummary (start_period DATE,
end_period DATE)
SELECT
COUNT(DISTINCT CASE WHEN acct_nbr = '3020' AND journal_code =
'CD'
THEN source ELSE NULL END),
-SUM(CASE WHEN acct_nbr = '1050' AND journal_code ='CR'
THEN amount ELSE 0.00 END),
COUNT(DISTINCT CASE WHEN acct_nbr IN ('6010', '6090') AND
journal_code = 'SJ'
THEN source ELSE NULL END),
-SUM(CASE WHEN acct_nbr = '1050' AND journal_code = 'CR'
THEN amount ELSE 0.00 END),
SUM(CASE WHEN acct_nbr = '3020' AND journal_code = 'CD'
THEN amount ELSE 0.00 END),
COUNT(DISTINCT CASE WHEN acct_nbr = '3020' AND journal_code IN
('PJ', 'TJ')
THEN source ELSE NULL END)
INTO j_tally, c_total, d_tally, e_total, f_total, k_tally
FROM GeneralLedger AS GL
WHERE period BETWEEN start_period AND end_period;
130 CHAPTER 6: CODING CHOICES
Exceptions:
Sometimes the UNION [ALL] is what you actually want. The other set
operations in SQL-92, EXCEPT [ALL], and INTERSECT [ALL] are not
widely available yet.
6.11 Testing SQL
When you are first writing a schema, you will probably generate some
test data. If you look in the literature, there is a thing called an
Armstrong set, which is the minimal number of rows that will test all of
the constraints in a schema. Although it is difficult to automatically
create an Armstrong set, you can do a good job with a little effort.
6.11.1 Test All Possible Combinations of NULLs
Rationale:
NULLs behave strangely, and if there are problems, there is a good
chance that a NULL will be involved. Newbies using graphic tools often
leave more NULL-able columns in a single table than a professional
would in an entire schema for a Fortune 500 company payroll.
Exceptions:
If the number of combinations is excessive, then look at a redesign
rather than a stress test. It means you probably have too many NULL-
able columns in the schema.
6.11.2 Inspect and Test All CHECK() Constraints
Rationale:
You can extract the CHECK() constraint predicates from the DDL and
look at them. The first thing is to see if the same data element has the
same rules in all of the tables. Some attributes will always have the same
CHECK() constraints if the model is correct. For example, the data type,
regular expression, and check digit for a UPC code will be the same
everywhere in the schema.
Some attributes may have different constraints in different tables. For
example, it would be reasonable to have “quantity INTEGER DEFAULT
0 NOT NULL CHECK (quantity >= 0)” almost everywhere that the
quantity attribute appears. However, you might find that there is also a
“CHECK (quantity > 0)” on a table. Is this an error or a situation where a
zero quantity is disallowed? You need to look and see.
6.11 Testing SQL 131
Exceptions:
None
6.11.3 Beware of Character Columns
Rationale:
Character columns seldom have enough constraints on them. The result
is that they have extra blanks in them, allow mixed-case letters, and will
pretty much hold any kind of garbage that a user wishes to put in them.
My favorite piece of test data for oversized, unconstrained
NVARCHAR(n) columns is a collection of Buddhist sutras in Chinese
unicode. At least the users will learn a bit of classic Buddhist thought.
Exceptions:
None
6.11.4 Test for Size
Rationale:
One of the problems with small test data sets is that they will run just
fine in the development shop, but when the size of the tables grows
larger, you can get gradually degraded performance or catastrophe
points. A catastrophe point is when there is a sudden change in the
performance—the straw that breaks the camel’s back. There is usually a
physical component to a catastrophe point, such as excessive paging to a
hard drive. Frankly, there is not a lot you can do about it except wait and
see if it was a fluke or if it happens again.
Gradually degraded performance is the nicer of the two situations.
You can monitor the system, see the loss, and take action before
anything bad happens. The bad news is that the term gradual can be very
short. The query that ran so well on a few thousand rows of test data is a
pig when it goes live on several million rows of production data. Try to
stress test on a data set that is larger than the current production
database. That will let you know you have some margin of error.
Exceptions:
None