There is already a question on this topic Difference between SpatialJoin with “…where…” and “join…on…”. However, it is dedicated to PostGIS.
Since in the Virtual Layer query window, I cannot simply proceed with EXPLAIN ANALYZE
command before my SELECT
, therefore cannot see the difference between two queries and moreover it uses SQLite and Spatialite to operate.
So, what is the difference between those queries in terms of Virtual Layers and QGIS?
SELECT d.id, COUNT(p.id) PoInPol
FROM districts AS d, points AS p
WHERE st_within(p.geometry, d.geometry)
GROUP BY d.id
Versus
SELECT d.id, COUNT(p.id) PoInPol
FROM districts AS d
JOIN points AS p ON st_within(p.geometry, d.geometry)
GROUP BY d.id
Those queries simply give a number of points inside each feature of districts layer.
Answer
DB Query Planners are (usually) smart and will rewrite the query to make it the most efficient. BUT, there is still a difference between the two queries (especially if we ignore the planner intervention).
For the example in the question, it makes no difference. If you use LEFT
/RIGHT
/OUTER JOIN
s, it does. If you add extra conditions, it also makes a difference. And if you have several joins, it can have a great impact, both in terms of results and efficiency.
When the filtering condition is put inside the JOIN
, it is evaluated right away and only rows satisfying the condition are used in the next join.
When the filtering condition is put in the WHERE
clause, it is evaluated at the end, so all rows are considered.
SELECT a.id
FROM a,b,c
WHERE a.id = b.id AND b.id = c.id
AND a.val=1 AND b.val=2 AND c.val=3;
--> do a CROSS JOIN
between a,b,c. From all rows (so a size * b size * c size), keep the ones satisfying the condition.
SELECT a.id
FROM a
JOIN B ON a.id = b.id AND a.val = 1 AND b.val = 2
JOIN C ON b.id = c.id AND c.val = 3;
--> get all rows from A. Keep the rows having a.val=1
. Match rows in B by id
and keep only the rows having b.val=2
. Using this partial result set, match rows in C by id
and keep the rows having c.val=3
Using a LEFT JOIN
, the difference is in the result. Suppose we have an entry in table A with no match (by ID) in table B.
SELECT *
FROM A
LEFT JOIN B ON a.id = b.id
WHERE b.val =2;
--> the row that exists only in A is kept in the join. The WHERE
clause filters it out.
SELECT *
FROM A
LEFT JOIN B ON a.id = b.id AND b.val = 2;
--> There is no row in B matching the row ID and b.val
, so the right side of the join is NULL. Since there is a row on the left side of the join, the row is returned (A.*
is populated, B.*
is null)
No comments:
Post a Comment