開始前先介紹什麼是
Left outer join
跟
Inner join
Left outer join:
LEFT JOIN 可以用來建立左外部連接(上圖的 A + B 部分),查詢的 SQL 敘述句 LEFT JOIN 左側資料表 (table_1) 的所有記錄都會加入到查詢結果中,最終產生一張新的表。
outer join
會列出主要表格中每一筆資料,無論它的值在另一個表格中有沒有出現
=> 即使右側資料表 (table_2) 有沒有符合的值,左側資料表(table_1)每一筆還是會顯示出來。
LEFT JOIN 查詢用法 (Example)
這是一個客戶資料表 customers:
Address Phone
SELECT customers.Name, orders.Order_No # 挑出要顯示的欄位
FROM customers
LEFT JOIN orders # orders 資料表往左接上 customers 資料表
ON customers.C_Id=orders.C_Id; # 指定兩資料表連接的條件
其中用點號連接之 XXX.YYY 表示 XXX 資料表中的 YYY 欄位。
Order_No
使用前面建立的Product
(商品)跟ProductStyle
(商品樣式)Model
#app/models/product.rb
class Product < ApplicationRecord
has_many :styles, class_name: 'ProductStyle' # 關聯取名為styles
#app/models/product_style.rb
class ProductStyle < ApplicationRecord
belongs_to :product
Table
Model
產生以下 SQL 跟回傳值:
Product Load (0.4ms) SELECT "products".* FROM "products"
ProductStyle Load (0.2ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2 [["product_id", 3], ["LIMIT", 1]]
ProductStyle Load (0.2ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2 [["product_id", 4], ["LIMIT", 1]]
ProductStyle Load (0.1ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2 [["product_id", 6], ["LIMIT", 1]]
ProductStyle Load (0.1ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2 [["product_id", 7], ["LIMIT", 1]]
ProductStyle Load (0.1ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2 [["product_id", 8], ["LIMIT", 1]]
# => ["香蕉乾", "芒果乾", "商品款式2", "商品款式3", "商品款式4"]
後面款式都亂取名,懶得想名字了
N+1 Query這個效能殺手,造成呼叫 5 + 1次SQL(第一次呼叫products
資料表,每次迴圈呼叫一次product_styles
資料表,共5次)
呼叫SQL的成本是巨大的,該怎麼辦?
Rails透過ORM語法把關聯資料表做預加載,如此一來迴圈內呼叫關聯時,不用再呼叫SQL從DB取資料,改從cache取資料。使用預加載,迴圈越多次程式碼效率提升就越顯著。
加載關聯資料表
Rails 提供四種加載關聯資料表的方法
preload
eager_load
includes
joins
preload
preload
會生成兩條獨立
的SQL查詢,所以關聯資料表會一起被 query 出來,但不會建立資料表之間的關聯
products = Product.preload(:styles).order("products.id DESC")
產生以下 SQL:
Product Load (0.4ms) SELECT "products".* FROM "products" ORDER BY products.id DESC LIMIT $1 [["LIMIT", 11]]
ProductStyle Load (0.2ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" IN ($1, $2, $3, $4, $5) [["product_id", 8], ["product_id", 7], ["product_id", 6], ["product_id", 4], ["product_id", 3]]
preload
會將products
有關的資料表統一加載出來,但不會建立資料表之間的關聯,如果要拿preload
生成的SQL語句來做關聯查詢(where)
或排序(order)
,會出錯:
Product.preload(:styles).where("product_styles.title like ?", 'test')
噴錯如下:
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: missing FROM-clause entry for table "product_styles")
不過preload
確實可以解決N+1 query
:
Product.preload(:styles).map do |product|
product.styles.first.title
只產生以下兩條 SQL 跟回傳值:
Product Load (0.4ms) SELECT "products".* FROM "products"
ProductStyle Load (0.4ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" IN ($1, $2, $3, $4, $5) [["product_id", 3], ["product_id", 4], ["product_id", 6], ["product_id", 7], ["product_id", 8]]
# => ["香蕉乾", "芒果乾", "商品款式2", "商品款式3", "商品款式4"]
eager_load
eager_load
使用LEFT OUTER JOIN
進行單次查詢,並加載所有關聯數據(這樣資料表之間就有關聯了)
products = Product.eager_load(:styles).order("products.id DESC")
產生以下 SQL:
SQL (0.6ms) SELECT DISTINCT products.id AS alias_0, "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" ORDER BY products.id DESC LIMIT $1 [["LIMIT", 11]]
SQL (0.6ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE "products"."id" IN ($1, $2, $3, $4, $5) ORDER BY products.id DESC [["id", 8], ["id", 7], ["id", 6], ["id", 4], ["id", 3]]
可以看到為了加載關聯,eager_load
用了更複雜的SQL語法,雖然速度比preload
慢,但是解決了preload
不能關聯查詢的問題
Product.eager_load(:styles).where("product_styles.title like ?", 'test')
產生以下 SQL:
SQL (0.5ms) SELECT DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') LIMIT $1 [["LIMIT", 11]]
SQL (0.4ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') AND "products"."id" = $1 [["id", 4]]
eager_load 同時也可以解決N+1 query
Product.eager_load(:styles).map do |product|
product.styles.first.try(:title)
產生以下 SQL:
SQL (9.4ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "products"."order_id" AS t0_r6, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id"
從SQL語法來看精準搜尋
先看沒有限制條件,單純做eager_load
Product.eager_load(:options)
SQL (0.7ms) SELECT DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" LIMIT $1 [["LIMIT", 11]]
SQL (0.5ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "products"."store_id" AS t0_r6, "options"."id" AS t1_r0, "options"."product_id" AS t1_r1, "options"."title" AS t1_r2, "options"."option1" AS t1_r3, "options"."option2" AS t1_r4, "options"."created_at" AS t1_r5, "options"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" WHERE "products"."id" IN ($1, $2, $3, $4, $5) [["id", 4], ["id", 2], ["id", 7], ["id", 8], ["id", 3]]
第二條SQL會把所有product的id都列出來["id", 4], ["id", 2], ["id", 7], ["id", 8], ["id", 3]
再看有限制條件eager_load
Product.eager_load(:options).where(options: {title: '尺寸'})
SQL (42.9ms) SELECT DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" WHERE "options"."title" = $1 LIMIT $2 [["title", "尺寸"], ["LIMIT", 11]]
SQL (0.6ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "products"."store_id" AS t0_r6, "options"."id" AS t1_r0, "options"."product_id" AS t1_r1, "options"."title" AS t1_r2, "options"."option1" AS t1_r3, "options"."option2" AS t1_r4, "options"."created_at" AS t1_r5, "options"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" WHERE "options"."title" = $1 AND "products"."id" IN ($2, $3) [["title", "尺寸"], ["id", 2], ["id", 3]]
可以看到第一條 SQL 有加上條件["title", "尺寸"]
,篩選出符合的 products (id: 2 跟 3)
所以第二條 SQL 在查詢時,只有搜尋符合條件的["id", 2], ["id", 3]
。
Rails 的 ORM語法雖然方便,但是如果能了解一些SQL,就能更容易找到適當的方法。
每次都要想何時用preload
?何時用eager_load
? 很麻煩
所以Rails 還有includes
這個大家最常用的方法
ps. eager_load在Rails 4是產生一條SQL,這裡用Rails 5 示範
includes
includes
預設效果是preload
,需要做關聯資料表時又會轉成eager_load
。
預設效果同preload
:
Product.includes(:styles)
產生以下 SQL:
Product Load (0.3ms) SELECT "products".* FROM "products" LIMIT $1 [["LIMIT", 11]]
ProductStyle Load (0.4ms) SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" IN ($1, $2, $3, $4, $5) [["product_id", 3], ["product_id", 4], ["product_id", 6], ["product_id", 7], ["product_id", 8]]
精準查詢關聯資料表,效果同eager_load
:
Product.includes(:styles).where(product_styles: {title: 'test'})
產生以下 SQL:
SQL (0.6ms) SELECT DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE "product_styles"."title" = $1 LIMIT $2 [["title", "test"], ["LIMIT", 11]]
SQL (0.4ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE "product_styles"."title" = $1 AND "products"."id" = $2 [["title", "test"], ["id", 4]]
# => #<ActiveRecord::Relation [#<Product id: 4, title: "商品二", price: 1985.0, description: "要買要快", created_at: "2019-09-30 14:38:42", updated_at: "2019-09-30 14:38:42">]>
如果是模糊搜尋,要搭配references
才能達到eager_load
的效果
products = Product.includes(:styles).where("product_styles.title like ?", 'test').references(:styles)
產生以下 SQL 跟回傳值:
SQL (0.5ms) SELECT DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') LIMIT $1 [["LIMIT", 11]]
SQL (0.5ms) SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') AND "products"."id" = $1 [["id", 4]]
# => #<ActiveRecord::Relation [#<Product id: 4, title: "商品二", price: 1985.0, description: "要買要快", created_at: "2019-09-30 14:38:42", updated_at: "2019-09-30 14:38:42">]>
ps. 模糊搜尋的includes
不搭配references
會噴錯,各位看倌可以拿掉試試
所以includes
既可以解決N+1 query
的問題,搭配reference
又可以建立關聯查詢,提到N+1 query
推薦用includes
原因就在這裡。
joins
joins
主要用於建立model之間的關係,用INNER JOIN
來關聯數據, “不會”將關聯的資料做預加載
,所以對N+1 query
並無幫助,常用來建立關聯搜尋,
以上面的模糊搜尋為例:
Product.joins(:styles).where("product_styles.title like ?", 'test')
產生以下 SQL 跟回傳值:
Product Load (0.6ms) SELECT "products".* FROM "products" INNER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') LIMIT $1 [["LIMIT", 11]]
# => #<ActiveRecord::Relation [#<Product id: 4, title: "商品二", price: 1985.0, description: "要買要快", created_at: "2019-09-30 14:38:42", updated_at: "2019-09-30 14:38:42">]>
只產生一條SQL。所以joins
可以用來建立資料表關聯
但如果想要joins
處理我們今天第一個問題,會更慘。因為可能產生重複資料,SQL查詢次數比沒加joins
還多 w(゚Д゚)w
原因如下:
Product.joins(:styles).size #=> 9
Product.joins(:styles).uniq.size #=> 5
原來Product.joins(:styles)
實際上是:
查詢全部含有product_id的product_style,並回傳該product_style所屬的product
如果有很多筆product_style屬於同一個product,就會回傳大量相同的product,要用.uniq來刪除重複的項目
一對一關係不會發生重複的情況
反過來就沒問題,因為 ProductStyle 對 Product 是多對一:
ProductStyle.joins(:product).size # => 9
ProductStyle.joins(:product).uniq.size # => 9
Rails提供關聯資料表加載,幫助我們解決N+1 query
這個隱形效能殺手,也可以建立資料表的關聯。
四種該如何使用,以下是簡單的判斷
只需要解決 N+1 query,用 preload 預加載
只需要精準或模糊搜尋,使用joins 建立關聯
需要建立關聯,又需要預加載,使用eager_load
新手不知道該怎麼用,用includes
似乎最方便
上面只是概略的描述,實際執行會因為資料表的行數及列數不同,導致效能也不同,還是要依照情況而定。