Recently, a colleague told me that it wasn’t advisable to make conditions in the join clauses. Instead he suggested to make conditions in the where clause. He told me that the SQL engine was optimized for this way.
Here is a simple example to illustrate my question. In this case, I think it would make any difference.
What strategy is the best? And why?
Assume we have a parameter nammed @user_id
.
First strategy
SELECT role.name
FROM user_role
INNER JOIN user ON user_role.user_id = user.id AND
user_role.user_id = @user_id
INNER JOIN role ON user_role.role_id = role.id
Second strategy
SELECT role.name
FROM user_role
INNER JOIN user ON user_role.user_id = user.id
INNER JOIN role ON user_role.role_id = role.id
WHERE user.id = @user_id
1
The two queries you gave are not equivalent. In the second, user_role.user_id
is explicitly linked to user.id
, whereas in the first, there’s no link between the user
table and the user_role
table. (What database are you using for that, BTW? An inner join with no linkage to the existing query contents doesn’t look like valid SQL to me.)
Your friend is right, BTW. JOIN
criteria are supposed to be for specifying the linkage between the tables. It affects the way one entire table relates to another entire table. Filtering (selecting which rows within a table are valid) should go in the WHERE
clause.
EDIT: In response to the comment:
I understand what you say, but I still think that the first strategy
is more efficient because all the lines are not loaded. Probably
you’re right, but I would understand why.
First, if by “lines” you mean “rows” or “records in the table,” it’s highly unlikely that all the lines will be loaded as part of the JOIN
anyway. That would be highly inefficient, but only a really horrible database engine would do that.
The SQL engine uses a process known as relational algebra to transform the query you write into the optimum method for retrieving the data you’re querying for. Just like the algebra you learned in school, it can involve moving terms around from one place to another under specific rules that ensure that everything stays equivalent in the final result.
As DFord mentioned in his comment, the DBMS will probably end up producing the same query plan from both styles. If you want to test it, try running the query plans of both queries (the mechanism to do this varies from one DBMS to another; you’ll have to look up how it’s done on your system) and comparing them.
That being true, the real reason to write your joins in the second style is not efficiency of execution, but efficiency of maintenance. If someone needs to read your query at some later point, and they’re familiar with standard SQL style, it will be a lot easier for them to understand if your query is written in standard SQL style too, which means using the JOIN
criteria to specify linkage and the WHERE
clause to specify filtering.
2
They are Both Wrong, This is the Correct way :
SELECT u.UserName,r.RoleName
FROM user u
Left JOIN user_role ur ON ur.user_id = u.id
Left JOIN role r ON r.id= ur.role_id
WHERE u.id = @user_id
Anything other than is SQL BLASPHEMY!
(I looked at your SQL for 20 min and i still don’t know what you expected it to do or return)
Edit: Adding in some of the comments below as there is some good stuff there:
Q) Why are you advocating left joins here as the Only True Way?
A) Because what good is role name by its self? You clearly want the User with Role Name combination. The left joins insure that no users are missed (in case a user does not have a role or if there is a relational integrity issue)
Think about it.. With inner joins how do you know if you have a bad userid or if the user doesn’t have a role? –
Q) You know there are no “relational integrity issues” because you have a FK constraint in place and the database takes care of that for you. (If you don’t, then that’s SQL heresy!) And since what’s being queried for here is the set of all role names for a specific user, if the user doesn’t have any roles, then an empty set is a perfectly valid response. (And a more correct one than a set containing a single element whose value is NULL, which is what your scenario would yield.)
A) You often need to query foreign systems that break every rule in the book. That’s a fact of life. Code defensively, and keep good habits. – I reiterate if the user doesn’t exist this query returns a value that the runner will interpret incorrectly that as “The user doesn’t have a role”
9