叢集索引與索引欄位之效能提問

JJ

各位好,小弟有個疑問想請問大家,我有一張表內有數百萬筆的資料,number_key,A_year,A_month等欄位都是不常修改的必要篩選欄位,
在該情況下我使用方法一建立資料表,但我有個疑問,若我使用方法二讓seq建立唯一叢集 (PRIMARY KEY CLUSTERED),且不建立任何INDEX,
在我比較兩個方式的執行計畫與實際進行篩選,卻發現相差異不大,
方法一 使用篩選條件(number_key,A_year,A_month),執行計畫為叢集索引搜尋
方法二 使用篩選條件(number_key,A_year,A_month),執行計畫為叢集索引掃描,
數百萬筆的資料下,方法一的 I/O成本(0.00625)與運算子成本(0.0065) 方法二的I/O成本(0.0031)與運算子成本(0.0032),
搜尋全部欄位時間也差不多,為何如此? 另PRIMARY KEY 建立NONCLUSTERED與CLUSTERED有何差異?




方法一:
CREATE TABLE [dbo].[test](
	[seq] [int] IDENTITY(1,1) NOT NULL,
	[number_key] [char](11) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL,
	[A_year] [nchar](3) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL,
	[A_month] [char](2) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL,
	[text] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_test] PRIMARY KEY NONCLUSTERED 
(
	[seq] ASC,
	[number_key] ASC,
	[A_year] ASC,
	[A_month] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE UNIQUE CLUSTERED INDEX [IX_test_UNo] ON [dbo].[test]
(
	[seq] ASC,
	[number_key] ASC,
	[A_year] ASC,
	[A_month] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO







方法二:

CREATE TABLE [dbo].[test](
	[seq] [int] IDENTITY(1,1) NOT NULL,
	[number_key] [char](11) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL,
	[A_year] [nchar](3) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL,
	[A_month] [char](2) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL,
	[text] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED 
(
	[seq] ASC,
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO


pilipala
預設 Primary Key 就是 Clustered Index,沒有特別原因不會把兩者拆開,Primary Key 和 Unique 是條件約束
1. Primary Key 在 Table 內只允許存在一個,而 Unique 可以多個
2. Primary Key 不允許 null 存在,而 Unique 允許 null 存在

Clustered Index 建議單一欄位就好,方法二再加上 number_key、a_year、a_month 建立 nonclustered index 來使用就好,多欄位定義為 Clustered Index 的話,要注意只有第一個欄位資料有排序,篩選條件有用到第一個欄位才會觸發 Index Seek,以方法一來說,篩選條件一定要有 seq 欄位才會觸發 Index Seek

Table 有建立 Clustered index 稱為 B_Tree 架構,沒有建立為 Heap,兩者架構上就有差異,比較不適合放在一起比較效能,另外觀察效能,也可以開啟 set statistice, time on 來觀察 Table logical read 和 cpu time
JJ
pilipala您好! 感謝您的回覆,但我仍有些許疑問想請教您,
我照了您的建議改了方法2,而在方法1中,由於SEQ只用於確保資料不會重複,實際查詢並未使用,故將其移出叢集索引
遇到問題如下:

1. 方法2在篩選 [A_year] ,[A_month],[number_key]的時候,由於我寫入大量假資料進資料表,查詢時sql顯示有遺漏,為何會如此
是因為假資料造成的嗎?,當我只寫入兩萬多筆資料的時候就沒有出現遺漏索引了,可是相同資料再方法1並未出現遺漏索引。
2. 由於方法1是把篩選條件都綁成叢集索引故測試結果也顯示其搜尋成本與邏輯讀取都遠大於方法2,我想請問以我目前需求方法1
是否優於方法2
3. 無論是方法1還是方法2,甚至原版的方法2(只有叢集沒有非叢集index),我在實際查詢上雖然成本相差甚異,但實際執行時間卻是
差不多的,為何如此? 是因為資料經過SEQ的叢集排序後,本身查詢效能就已經得到大量提升,導致後面再進行優化,實際查詢也難以
覺察嗎?





方法1:
1 CREATE TABLE [dbo].[test]( 
2	    [seq] [int] IDENTITY(1,1) NOT NULL, 
3	    [number_key] [char](11) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL, 
4	    [A_year] [nchar](3) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL, 
5	    [A_month] [char](2) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL, 
6	    [text] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
7	 CONSTRAINT [PK_test] PRIMARY KEY NONCLUSTERED  
8	( 
9	    [seq] ASC, 
10	    [number_key] ASC, 
11	    [A_year] ASC, 
12	    [A_month] ASC 
13	)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] 
14	) ON [PRIMARY] 
15	GO 
16	 
17	 
18	CREATE UNIQUE CLUSTERED INDEX [IX_test_UNo] ON [dbo].[test] 
19	( 
20	   
21	    [number_key] ASC, 
22	    [A_year] ASC, 
23	    [A_month] ASC 
24	)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] 
25	GO 



方法2:
CREATE TABLE [dbo].[test2]( 
2	    [seq] [int] IDENTITY(1,1) NOT NULL, 
3	    [number_key] [char](11) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL, 
4	    [A_year] [nchar](3) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL, 
5	    [A_month] [char](2) COLLATE Chinese_Taiwan_Stroke_CI_AS NOT NULL, 
6	    [text] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
7	 CONSTRAINT [PK_test2] PRIMARY KEY CLUSTERED  
8	( 
9	    [seq] ASC 
10	)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] 
11	) ON [PRIMARY] 
12	GO 
13 CREATE NONCLUSTERED INDEX [IX_TEST2] ON [dbo].[TEST2]
14 (
15 	[A_year] ASC,
16	[A_month] ASC,
18	[number_key] ASC
19  )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
20  GO

最有價值解答

pilipala
1. 方法2在篩選 [A_year] ,[A_month],[number_key]的時候,由於我寫入大量假資料進資料表,查詢時sql顯示有遺漏,為何會如此是因為假資料造成的嗎?,當我只寫入兩萬多筆資料的時候就沒有出現遺漏索引了,可是相同資料再方法1並未出現遺漏索引。


猜測當時方法二還未建立 NonClustered index,所以才會有 missing index,方法一因為跑在 Clustered index 上,就不會有 missing index ,missing index 只會建議 NonClustered index 而已

2. 由於方法1是把篩選條件都綁成叢集索引故測試結果也顯示其搜尋成本與邏輯讀取都遠大於方法2,我想請問以我目前需求方法1,是否優於方法2


兩種 Table 設計方式不同,不該由一個 TSQL 來決定孰優孰劣,這有點奇怪。

方法一把 Primary Key 和 Clustered Index 拆開,最常討論情境是 Primary Key 是 GUID,可以參考該篇文章 https://blog.darkthread.net/blog/guid-as-pk-on-db

3. 無論是方法1還是方法2,甚至原版的方法2(只有叢集沒有非叢集index),我在實際查詢上雖然成本相差甚異,但實際執行時間卻是差不多的,為何如此? 是因為資料經過SEQ的叢集排序後,本身查詢效能就已經得到大量提升,導致後面再進行優化,實際查詢也難以覺察嗎?


方法一和方法二的 Index 欄位順序不一樣喔
方法一:number_key、A_year、A_month,篩選條件一定要有 number_key 欄位才有機會觸發 Index Seek
方法二:A_year、A_month、number_key,篩選條件一定要有 A_year 欄位才有機會觸發 Index Seek
可以檢查一下是否該差異造成沒有吃到預期 Index 而造成的差異
pilipala
Table 有建立 Clustered index 稱為 B_Tree 架構,沒有建立為 Heap,兩者架構上就有差異,比較不適合放在一起比較效能


1F 回覆這段回到錯亂,請自動忽略,Sorry
JJ
了解了! 非常感謝您
回到頂部