I’ve stumbled upon this shortcoming of PostgreSQL optimizer: the restrictions in the LEFT JOIN’s ON expression are not used to access the outer table. The testcase is fairly basic:
test=# explain
test-# select *
test-# from t1 left join t2 on (t1.key1<100 AND t2.key1=t1.col);
QUERY PLAN
------------------------------------------------------------------------
Hash Left Join (cost=30832.00..73228.00 rows=1000000 width=16)
Hash Cond: (t1.col = t2.key1)
Join Filter: (t1.key1 < 100)
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8)
-> Seq Scan on t2 (cost=0.00..14425.00 rows=1000000 width=8)
(6 rows)
The optimizer prefers to use Seq Scan to read 1M rows from table t1, despite the condition “t1.key1<100” which matches only 100 rows, and has a suitable index. If I change LEFT JOIN into INNER, the optimizer will use an index:
test=# explain
test-# select *
test-# from t1 inner join t2 on (t1.key1<100 and t2.key1=t1.col);
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop (cost=5.59..1136.23 rows=95 width=16)
-> Bitmap Heap Scan on t1 (cost=5.16..341.25 rows=95 width=8)
Recheck Cond: (key1 < 100)
-> Bitmap Index Scan on t1_key1 (cost=0.00..5.14 rows=95 width=0)
Index Cond: (key1 < 100)
-> Index Scan using t2_key1 on t2 (cost=0.42..8.36 rows=1 width=8)
Index Cond: (key1 = t1.col)
The script to reproduce is here.