While preparing SQL introductory material, I’ve ended asking myself about the line of thought followed by a developer when writing a query. I believe it could be too valuable from a beginner’s point of view.
Let me illustrate the subject with an example. For the sake of brevity, I won’t include the database schema, hoping for the query to be descriptive enough by itself.
Suppose we want to write a query for retrieving all new customers from the last quarter. Typical. Me, the first thing I think about is the set of entities involved, beginning with the ones corresponding to the requested data. So, as I want to get customers, first of all I write the following query skeleton:
SELECT
FROM Customers AS NewCustomer
WHERE
There immediately follow the remaining entities. In this fictional context, a new customer is any customer having placed an order for the first time:
SELECT
FROM Customers AS NewCustomer
INNER JOIN Orders AS RecentOrder
LEFT JOIN Orders AS OlderOrder
WHERE
AND OlderOrder.Id IS NULL
With all the entities identified, I proceed writing the relations between them:
SELECT
FROM Customers AS NewCustomer
INNER JOIN Orders AS RecentOrder
ON RecentOrder.CustomerId = NewCustomer.Id
LEFT JOIN Orders AS OlderOrder
ON OlderOrder.CustomerId = RecentOrder.CustomerId
AND OlderOrder.PlacementDate < RecentOrder.PlacementDate
WHERE
AND OlderOrder.Id IS NULL
Once I’m finished with the entities, it’s time for me to define whatever restrictions to be applied over the resulting data set:
SELECT
FROM Customers AS NewCustomer
INNER JOIN Orders AS RecentOrder
ON RecentOrder.CustomerId = NewCustomer.Id
LEFT JOIN Orders AS OlderOrder
ON OlderOrder.CustomerId = RecentOrder.CustomerId
AND OlderOrder.PlacementDate < RecentOrder.PlacementDate
WHERE RecentOrder.PlacementDate BETWEEN @fromDate AND @toDate
AND OlderOrder.PlacementDate < @fromDate
AND OlderOrder.Id IS NULL
Finally, I ask myself which concrete data fields I want the query to return, thus completing the SELECT
clause:
SELECT NewCustomer.FirstName, NewCustomer.LastName
FROM Customers AS NewCustomer
INNER JOIN Orders AS RecentOrder
ON RecentOrder.CustomerId = NewCustomer.Id
LEFT JOIN Orders AS OlderOrder
ON OlderOrder.CustomerId = RecentOrder.CustomerId
AND OlderOrder.PlacementDate < RecentOrder.PlacementDate
WHERE RecentOrder.PlacementDate BETWEEN @fromDate AND @toDate
AND OlderOrder.PlacementDate < @fromDate
AND OlderOrder.Id IS NULL
Of course, this is a very simple query, without aggregates, nested queries and other advanced stuff. But this is my overall train of thought, whatever the query’s complexity:
- Entities
- Relations
- Restrictions
- Fields
My question is, is this a suitable schema to be taught explicitly to a beginner? How could it be improved, anyway, in order to smooth her learning curve?
tl;dr
I think you have it exactly backwards.
A knowledgeable developer constructs his queries in the order you mention there because that is the order of dependencies – You don’t know what relations you need to follow until you know what entities you need, you don’t know what restrictions are necessary without first identifying your relations, and you can’t know what to restrict on until you know the field set at your disposal.
Therefore a knowledgeable developer often times follows the order as set forth because in the real world we need the highest level component – the entity – and that part drives how we use all the others.
However, this order is from highest level most dependant to lowest level least dependant pieces. For someone who’s unfamiliar with SQL you don’t want to teach entities first when they don’t yet know about relations, restrictions, or fields; transitively you shouldn’t teach relations to someone who doesn’t know restrictions or fields, or restrictions to someone who doesn’t know fields.
Given the concept you’re detailing above and the break down the way I see it, you should teach them in precisely the opposite order as you have suggested.
I think this becomes obvious if you look at the example query you have above – to query for conceptual entities you end up with a query far more complex than a new SQL developer should have to look at first. Further, until the last piece you haven’t got a completed SQL query because your last thing to teach is the most dependent piece that every other piece requires just to work at all. The way you suggest going about it would have you giving incomplete SQL examples as you have above until you were done teaching all of the concepts, it seems more reasonable to give smaller complete examples that grow more complex rather than growing more complete.
3