In this format, the IN operator merely lists a number of values in parentheses.
There is also a second format for the
IN, one in which an entire SELECT state-
ment is inserted inside the parentheses. For example, a list of states might be
specified as:
WHERE State IN
(SELECT
States
FROM StateTable
WHERE Region ¼ 'Midwest')
Rather than list individual states, the second format allows us to generate a list of
states through more complex logic.
Let’s illustrate with an example that uses our Customers and Orders tables. Let’s
say we want to retrieve a list of customers who have ever paid cash for any order
they’ve placed. A
SELECT that accomplishes this is:
SELECT CustomerName AS 'Custom er Name'
FROM Customers
WHERE CustomerID IN
(SELECT CustomerID
FROM Orders
WHERE OrderType ¼ 'Cash')
The resulting data is:
Customer Name
William Smith
Natalie Lopez
Brenda Harper is not included in the list because she has never placed an order
using cash. Notice that a subquery
SELECT is placed entirely within the p arentheses
for the
IN keyword. Also note that the Custome rID column is used to connect the
two queries. Even though we are displaying CustomerName, we use CustomerID
to define the relationship between the Customers and Orders tables.
Once again, we can ask whether this subquery can also be expressed as a normal
query, and the answer is yes. Here is an equivalent query that returns the
same data:
Chapter 14
■
Subqueries146
SELECT CustomerName AS 'Customer Name'
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID ¼ Orders.CustomerID
WHERE OrderType ¼ 'Cash'
GROUP BY Customers.CustomerID, Customers.CustomerName
Without using a subquery, we can directly join the Customers and Orders table.
However, a
GROUP BY clause is now needed to ensure that we only bring back
one row per customer.
Correlated Subqueries
The subqueries we’ve seen so far have been uncorrelated subqueries. Generally
speaking, all subqueries can be classified as either uncorrelated or correlated.
These terms describe whether the subquery is related to the query in which it is
contained. Uncorrelated subqueries are unrelated. When a subquery is
unrelated, that means that it is completely independent of the outer query.
Uncorrelated queries are evaluated and executed only once as part of the entire
SELECT statement.
In contrast, correlated subqueries are specifically related to the outer query.
Because of this explicit relationship, correlated subqueries need to be evaluated
for each row returned and can produce different results each time the subquery is
executed.
The best way to explain is with an example. Returning to the Customers and
Orders tables, let’s say we want to produce a list of customers who have a total
order amount that is less than 20 dollars. Here’s a statem ent that accomplishes
that request:
SELECT
CustomerName as 'Customer Name'
FROM Customers
WHERE
(SELECT
SUM (OrderAmount)
FROM Orders
WHERE Customers.CustomerID ¼ Orders.CustomerID)
<20
Correlated Subqueries 147
The result is:
Customer Name
Brenda Harper
What makes this subquery correlated, as opposed to uncorrelated? The answer
can be seen by looking at the subquery itself:
SELECT
SUM (OrderAmount)
FROM Orders
WHERE Customers.CustomerID ¼ Orders.CustomerID
This subquery is correlated because it cannot be executed on its own. If run by
itself, this subquery would error because the Customers.CustomerID column
doesn’t exist within the context of the subquery.
To understand what’s going on, it’s helpful to look at the entire
SELECT state-
ment in a general way:
SELECT
CustomerName as 'Customer Name'
FROM Customers
WHERE
SubqueryResult <20
As a correlated subquery, the subquery needs to be evaluated for each customer.
Also note that this type of subquery only works if it returns a single value.
Again, one might ask if this subquery could be converted into a normal
SELECT
statement. In this case, it can. Here’s an equivalent statement that produces the
same results:
SELECT
CustomerName as 'Customer Name'
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID ¼ Orders.CustomerID
GROUP BY Customers.CustomerID, CustomerName
HAVING SUM (OrderAmount) < 20
Notice that, without a subquery, the equivalent statement now requires
GROUP BY and HAVING clauses. The GROUP BY clause creates groups of
Chapter 14
■
Subqueries148
customers, and the HAVING clause enforces the requirement that each group
must have ordered less than 20 dollars.
The EXISTS Operator
One additional technique associated with correlated subqueries is the use of a
special operator called
EXISTS. This operator allows you to determine if data in
a correlated subquery exists. Let’s say that you want to discover which customers
have placed orders. One way of accomplishing that is with this statement:
SELECT
CustomerName AS 'Customer Name'
FROM Customers
WHERE EXISTS
(SELECT *
FROM Orders
WHERE Customers.CustomerID ¼ Orders.CustomerID)
This statement returns:
Customer Name
William Smith
Natalie Lopez
Brenda Harper
Adam Petrie doesn’t appear in the results since he has not placed any orders. The
EXISTS keyword in the above statement is evaluated as true if the SELECT in
the correlated subquery returns any data. Notice that the subquery selects all
columns (
SELECT *). Since it doesn’t really matter what particular data is
selected in the subquery, we use the asterisk to return all data. We’re only inter-
ested in determining whether any data exists in the subquery.
As before, the logic in this statement can be expressed in other ways. Here’s a
statement that obtains the same results using a subquery with the
IN operator:
SELECT
CustomerName AS 'Customer Name'
FROM Customers
WHERE CustomerID IN
(SELECT CustomerID
FROM Orders)
The EXISTS Operator 149
Here’s another statement that retrieves the same data without a subquery:
SELECT
CustomerName AS 'Customer Name'
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID ¼ Orders.CustomerID
GROUP BY CustomerName
Using a Subquery as a Calculated Column
The final general use of subqueries is as a calculated column. Let’s say we would
like to see a list of customers, along with a count of the number of orders they
have placed.
This query can actually be accomplished without subqueries with this statement:
SELECT
CustomerName AS 'Customer Name',
COUNT (OrderID) AS 'Number of Orders'
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID ¼ Orders.CustomerID
GROUP BY Customers.CustomerID, CustomerName
ORDER BY Customers.CustomerID
The output is:
Customer Name Number of Orders
William Smith 1
Natalie Lopez 3
Brenda Harper 2
Adam Petrie 0
However, another way of obtaining the same result is to use a subquery as a
calculated column. This looks like the following:
SELECT
CustomerName AS 'Customer Name',
(SELECT
COUNT (OrderID)
FROM Orders
Chapter 14
■
Subqueries150