源码

我应该使用SQL JOIN还是IN子句?


我对最佳方法有疑问.当将数据视为大小可变时,我不确定哪种方法最好.

考虑以下3个表:

雇员

EMPLOYEE_ID,EMP_NAME

项目

PROJECT_ID,PROJ_NAME

EMP_PROJ(上面两个表中的许多表)

EMPLOYEE_ID,PROJECT_ID

问题:给定一个EmployeeID,找到与该Employee关联的所有项目的所有雇员.

我用两种方法尝试过.无论使用什么数据大小,两种方法的区别仅相差几毫秒.

SELECT EMP_NAME FROM EMPLOYEE
WHERE EMPLOYEE_ID IN (
    SELECT EMPLOYEE_ID FROM EMP_PROJ    
    WHERE PROJECT_ID IN (
        SELECT PROJECT_ID FROM EMP_PROJ p, EMPLOYEE e
        WHERE p.EMPLOYEE_ID = E.EMPLOYEE_ID 
        AND  E.EMPLOYEE_ID = 123)

select c.EMP_NAME FROM
(SELECT PROJECT_ID FROM EMP_PROJ 
  WHERE EMPLOYEE_ID = 123) a
JOIN 
EMP_PROJ b
ON a.PROJECT_ID = b.PROJECT_ID
JOIN 
EMPLOYEE c
ON b.EMPLOYEE_ID = c.EMPLOYEE_ID

到目前为止,我预计每个约有5000名员工和项目..但不知道存在多对多关系.
您会推荐哪种方法?
谢谢!

编辑:
方法1的执行计划

"Hash Join  (cost=86.55..106.11 rows=200 width=98)"
"  Hash Cond: (employee.employee_id = emp_proj.employee_id)"
"  ->  Seq Scan on employee  (cost=0.00..16.10 rows=610 width=102)"
"  ->  Hash  (cost=85.07..85.07 rows=118 width=4)"
"        ->  HashAggregate  (cost=83.89..85.07 rows=118 width=4)"
"              ->  Hash Semi Join  (cost=45.27..83.60 rows=118 width=4)"
"                    Hash Cond: (emp_proj.project_id = p.project_id)"
"                    ->  Seq Scan on emp_proj  (cost=0.00..31.40 rows=2140 width=8)"
"                    ->  Hash  (cost=45.13..45.13 rows=11 width=4)"
"                          ->  Nested Loop  (cost=0.00..45.13 rows=11 width=4)"
"                                ->  Index Scan using employee_pkey on employee e  (cost=0.00..8.27 rows=1 width=4)"
"                                      Index Cond: (employee_id = 123)"
"                                ->  Seq Scan on emp_proj p  (cost=0.00..36.75 rows=11 width=8)"
"                                      Filter: (p.employee_id = 123)"

方法2的执行计划:

"Nested Loop  (cost=60.61..112.29 rows=118 width=98)"
"  ->  Index Scan using employee_pkey on employee e  (cost=0.00..8.27 rows=1 width=4)"
"        Index Cond: (employee_id = 123)"
"  ->  Hash Join  (cost=60.61..102.84 rows=118 width=102)"
"        Hash Cond: (b.employee_id = c.employee_id)"
"        ->  Hash Join  (cost=36.89..77.49 rows=118 width=8)"
"              Hash Cond: (b.project_id = p.project_id)"
"              ->  Seq Scan on emp_proj b  (cost=0.00..31.40 rows=2140 width=8)"
"              ->  Hash  (cost=36.75..36.75 rows=11 width=8)"
"                    ->  Seq Scan on emp_proj p  (cost=0.00..36.75 rows=11 width=8)"
"                          Filter: (employee_id = 123)"
"        ->  Hash  (cost=16.10..16.10 rows=610 width=102)"
"              ->  Seq Scan on employee c  (cost=0.00..16.10 rows=610 width=102)"

这样看来,方法2的执行计划要好一些,因为方法1的“成本”为60,而方法1的成本为85.

怎么知道即使对于各种各样的许多组合,它也会成立?

(0)

本文由 投稿者 创作,文章地址:https://blog.isoyu.com/archives/woyinggaishiyongsql-joinhuanshiinziju.html
采用知识共享署名4.0 国际许可协议进行许可。除注明转载/出处外,均为本站原创或翻译,转载前请务必署名。最后编辑时间为:11 月 12, 2019 at 01:09 上午

热评文章