Semi-Join Subquery優化策略

Semi-Join Subquery(半連接子查詢)：對應IN或EXISTS子查詢，僅需要檢查"外表記錄"在"子查詢結果集"中是否存在匹配記錄，不需要計算"子查詢結果集"中記錄匹配次數，也不需要回傳"子查詢結果集"中匹配記錄內容

在MariaDB(MySQL)中，常用優化Semi-Join(半連接)的策略有：

First Match
Table Pullout
Semi-join Materialization
Loose Scan
Duplicate Weedout

First Match策略

當回圈"外部查詢結果集"的每條記錄去"子查詢中"確認"是否匹配"時，只需要找到第一條匹配記錄(First Match)既可跳出子查詢，

如下面查詢：

SELECT * FROM Country 
WHERE Country.code IN (
    SELECT City.Country 
    FROM City 
    WHERE City.Population > 1*1000*1000
)
AND Country.continent='Europe'

如果不使用First Match策略，當處理到Country表上滿足" Country.continent='Europe' "條件的德國(Deu)記錄時，會掃描City表上滿足" City.Population > 110001000 AND City.Country='DEU' "的所有記錄，再根據匹配記錄總數回傳"是否匹配"結果：

如果使用First Match策略，當處理到Country表上滿足" Country.continent='Europe' "條件的德國(Deu)記錄時，會掃描City表上滿足" City.Population > 110001000 AND City.Country='DEU' "的第一條記錄"Berlin"后，立即回傳"是否匹配"結果：

在MariaDB上使用First Match策略的查詢的執行計劃為：

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) and Country.continent='Europe';
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| id | select_type | table   | type | possible_keys      | key       | key_len | ref                | rows | Extra                            |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
|  1 | PRIMARY     | Country | ref  | PRIMARY,continent  | continent | 17      | const              |   60 | Using index condition            |
|  1 | PRIMARY     | City    | ref  | Population,Country | Country   | 3       | world.Country.Code |   18 | Using where; FirstMatch(Country) |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
2 rows in set (0.00 sec)

MariaDB的執行計劃中會有明顯的FirstMatch標識，

在MySQL上使用First Match策略的查詢的執行計劃為：

MySQL [world]> explain select * from Country  where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) and Country.continent='Europe';
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| id | select_type        | table   | type           | possible_keys      | key       | key_len | ref   | rows | Extra                              |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
|  1 | PRIMARY            | Country | ref            | continent          | continent | 17      | const |   60 | Using index condition; Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country   | 3       | func  |   18 | Using where                        |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
2 rows in set (0.01 sec)

MariaDB的執行計劃中僅顯示為依賴子查詢(DEPENDENT SUBQUERY)

First Match策略和將IN子查詢轉換為EXISTS依賴子查詢很相似，但兩者還是存在明顯差異，并非所有EXISTS操作都能使用First Match策略，如子查詢中使用GROUP BY相關的聚合函式時，需要先完成GROUP BY操作才能確認"是否匹配"，

Table Pullout策略

當子查詢的查詢串列項只有主鍵或唯一索引鍵時，能推算出"子查詢結果集"不存在重復記錄，因此可以將子查詢改為關聯查詢，即將子查詢中的表上提到關聯查詢，

對于查詢：

SELECT *
FROM City 
WHERE City.Country IN (
	SELECT Country.Code
	FROM Country 
	WHERE Country.Population < 100*1000
);

在MariaDB 5.2 和MySQL 5.6版本及之前版本上，執行計劃為：

MySQL [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type            | possible_keys      | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | City    | ALL             | NULL               | NULL    | NULL    | NULL | 4079 | Using where |
|  2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3       | func |    1 | Using where |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)

如果Country.Code是主鍵或唯一索引，則可以將SQL改寫為：

SELECT City.* 
FROM City
INNER JOIN Country 
ON City.Country=Country.Code
WHERE Country.Population < 100*1000;

改為關聯查詢后，可以根據兩張關聯表的統計資料來選擇驅動表和被驅動表，因此在MariaDB 5.3或MySQL 5.7版本，執行計劃為：

MariaDB [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table   | type  | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY     | Country | range | PRIMARY,Population | Population | 4       | NULL               |   37 | Using index condition |
|  1 | PRIMARY     | City    | ref   | Country            | Country    | 3       | world.Country.Code |   18 |                       |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
2 rows in set (0.00 sec)

Materialization策略

在使用Table Pullout策略時，需要能明確推算出"子查詢結果集"不存在重復記錄時才能將"子查詢"改為"關聯查詢"，如果將"子查詢結果集"通過臨時表去重固化后消除重復記錄，則可以將子查詢轉換為"關聯查詢"，即Materialization策略，

如對于查詢：

SELECT * FROM Country 
WHERE Country.code IN (
    SELECT City.Country 
    FROM City 
    WHERE City.Population > 1*1000*1000
)
AND Country.continent='Europe'

在轉換為"關聯查詢"后，按照"關聯查詢"中臨時表是否為"驅動表"可以將Semi-join Materialization策略細分為：

Materialization/scan 策略，將臨時表作為"驅動表"，遍歷臨時表中每條記錄去另外關聯表中查找匹配記錄，
Materialization/lookup 策略，將臨時表作為"被驅動表"，遍歷另外的關聯表在臨時表中查詢匹配記錄，

使用Materialization/scan 策略時，MariaDB 查詢計劃為：

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY      | <subquery2> | ALL    | distinct_key       | NULL       | NULL    | NULL               |   15 |                       |
|  1 | PRIMARY      | Country     | eq_ref | PRIMARY            | PRIMARY    | 3       | world.City.Country |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population | 4       | NULL               |   15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)

使用Materialization/lookup 策略時，MariaDB 查詢計劃為：

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key          | key_len | ref  | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
|  1 | PRIMARY      | Country     | ALL    | PRIMARY            | NULL         | NULL    | NULL |  239 |                       |
|  1 | PRIMARY      | <subquery2> | eq_ref | distinct_key       | distinct_key | 3       | func |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population   | 4       | NULL |  238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)

Loose Scan策略

在Materialization/scan 策略時，需要先將"子查詢結果集"移除重復記錄并固化到臨時表，再作為驅動表進行關聯查詢，MySQL特性Index Loose Scan能在一次掃描中得跳過重復索引鍵得到"沒有重復記錄的臨時結果集"，Loose Scan策略基于Index Loose Scan特性保證關聯查詢不會出現"重復關聯問題"，

如對于查詢：

SELECT * FROM Country  
WHERE Country.code IN (
    SELECT country_code FROM Satellite
)

如果Satellite.country_code 存在索引，基于Index Loose Scan特性則能快速獲得"SELECT DISTINCT country_code FROM Satellite"的效果，如圖所示：

使用Loose Scan 策略時，MariaDB 查詢計劃為：

MariaDB [world]> explain select * from Country where Country.code in (select country_code from Satellite);
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| id | select_type | table     | type   | possible_keys | key          | key_len | ref                          | rows | Extra                               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
|  1 | PRIMARY     | Satellite | index  | country_code  | country_code | 9       | NULL                         |  932 | Using where; Using index; LooseScan |
|  1 | PRIMARY     | Country   | eq_ref | PRIMARY       | PRIMARY      | 3       | world.Satellite.country_code |    1 | Using index condition               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+

Loose Scan 策略和Materialization/scan 策略區別：

Materialization/scan 策略：先將子查詢的查詢結果固化去重后，再作為驅動表與外部表進行關聯查詢，查詢使用到臨時表，
Loose Scan 策略：在對子查詢的表進行Index Loose Scan操作程序中，直接將遍歷到的記錄與與外部表進行關聯查詢，查詢未使用到臨時表，

Duplicate Weedout策略

當無法根據表結構資訊推算出"子查詢結果集"不存在重復記錄時，如果將子查詢改寫為關聯查詢，則會導致"外表記錄"被關聯匹配多次而產生重復記錄，可以通過將關聯結果集插入到"帶有唯一索引的臨時表"的方式來移除重復記錄，保證最終查詢結果的準確性，

對于查詢：

SELECT * 
FROM Country 
WHERE Country.code IN (
    SELECT City.Country
    FROM City 
    WHERE City.Population > 0.33 * Country.Population 
    AND City.Population > 1*1000*1000
);

可以改寫為：

CREATE tmp_Country LIKE Country;

INSERT IGNORE INTO tmp_Country
SELECT Country.* 
FROM Country
INNER JOIN City
ON Country.code = City.Country
WHERE City.Population > 0.33 * Country.Population 
AND City.Population > 1*1000*1000

SELECT * FROM tmp_Country;

如圖所示：

使用Duplicate Weedout 策略時，MariaDB 查詢計劃為：

explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: range
possible_keys: Population,Country
          key: Population
      key_len: 4
          ref: NULL
         rows: 238
        Extra: Using index condition; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: world.City.Country
         rows: 1
        Extra: Using where; End temporary
2 rows in set (0.00 sec)

學習總結

First Match策略通過"找到第一條匹配記錄即回傳"的方式來跳過無效子查詢掃描，

除First Match策略外都是子查詢轉換為關聯查詢來優化提升查詢效率，按照不同查詢場景采用不同策略來"避免重復記錄"：

Table Pullout策略，通過唯一索引和主鍵索引邏輯來確認"子查詢結果集"中重復記錄，
Materialization策略，通過臨時表來移除"子查詢結果集"中重復記錄，
Loose Scan策略，通過Index Loose Scan特性來跳過"子查詢結果集"中重復記錄，
Duplicate Weedout策略，通過臨時表來將移除"關聯查詢結果集"中重復記錄，

參考資料

Semi-join Subquery Optimizations

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/551862.html

標籤：其他

上一篇：實驗小記之Linux上的Oracle11gR2單實體靜默安裝和建庫

下一篇：返回列表