SQL Server Indexes – Pros and Cons (Part 2)

Types of indexes: We have basically two types of indexes in SQL Server.
- Clustered index: It’s a type of index where the data itself is arranged at the indexed column i.e. we have the actual data at the leaf nodes. For example, in a science definition pocket book, every word is arranged alphabetically and the definition of the word is given along with it. Here there is no need to look up for the referencing page, as in case of other example like in a general book with indexes at the end of the book, contains page number along with the term or word which is then followed to get the information about the term.
In our above example, if we create cluster index on the column emp_id, the table itself will be rearranged according to the sorting order of emp_id.
Syntax:
USE [OLAP] GO /****** Object: Index [IX_EmployeeDim] ******/ CREATE CLUSTERED INDEX [IX_EmployeeDim] ON [dbo].[EmployeeDim] ( [emp_id] ASC )
After executing the query, perform a select statement on the table [EmployeeDim], you will notice that all the rows are ordered by [emp_id].
SELECT * FROM dbo.EmployeeDim

Advantage:
- Seeking operations are much faster than normal indexing.
Disadvantage:
- It will rearrange the whole data at every operation of UPDATE, INSERT or DELETE on key column / cluster index column, in order to keep ordering of the data in that column.
- Non-Clustered index: Here leaf node contains index pages or reference to the actual data record instead of data pages (actual data). Refer the first example for details about the syntax. It will not affect the default order of the table rather a separate data structure is maintained for each index.
Advantage:
- Can have more than one non-clustered index on a table.
- Reordering of the table data is not necessary
Disadvantage:
- Need separate data structure. As a result consume more space
- Over indexing i.e. more number of indexes may cause performance degradation.
| Cluster Indexing | Non-Cluster Indexing |
| • Actual data at the leaf nodes. | • Reference to the data on the leaf nodes. |
| • Data for the index column is sorted physically. | • Data in the index structure is sorted. |
| • Updating data in the cluster index column affects the whole table. | • Updating data does not affect the whole table, only the index structure has to be rebuild for the index column. |
| • Only 1 Cluster Index is possible for one table. | • Up to 249 Non Clustered Indexes are possible for one table in SQL Server 2005 and 999 in SQL Server 2008. |
| • Clustered index are unique in nature for a given table. | • This may not be unique. |
| • Can be efficient on fixed or rarely changing tables. | • Is good for table which is modified frequently. |
| • Index order matches the physical storage order of the data. | • Index order does not match with the physical order of the data. |
| • Does not required additional disk space to store data in the index as it is physically reordering the main table. | • Separate data structure is maintained for each index, resulting additional space consumption. |
Other terms in Indexing:
Composite index / Covering index: When an index (either Clustered or non-clustered) is created on the basis of two or more columns from a table, it is called Composite index. These are very useful in case of covering query, also called as Covering index. For example, your requirement is to get the entire employees id in between 132 and 203. Here you may note that you got the data in the index itself. Suppose, you are having a non-cluster index on emp_id column, your query will be as below:
SELECT [emp_id] FROM dbo.EmployeeDim WHERE [emp_id] between 132 and 203
Here the query engine will seek the index made on emp_id column and when it got the demanding rows, it will simply return the value for those rows maintained in the index as the query only needs information about emp_id and not any other information about the employee i.e. columns requested in the output are contained in the index itself.
Here the index itself is capable to provide the result, and so called covering index. In case of covering index, the key value contains all the required information to process a query, and hence there is no need to follow the reference for the original data in the table.
It increases the performance of the query dramatically. This is the reason why cluster index are faster than the non-cluster index because cluster index covers all the columns within it.
One thing that has to be kept in mind while building composite index is the order of column. For example, you are asked to list all the hotels from Andhra Pradesh state that are in Hyderabad city only. I know you can easily get it by looking for Andhra Pradesh in the map and then to Hyderabad and then you will search inside Hyderabad for the hotels. Obviously, here you will not search the rest of the world or rest of the cities other than Hyderabad. In fact, even if you are asked to find all the hotels in Andhra Pradesh, then also it is not a big deal for you.
But what if you are asked to search all the hotels inside Hyderabad city (assume there may be cities available in the name of Hyderabad in other corner of the earth), here in this case you may face problem as you don’t know which Hyderabad you have to search and where this Hyderabad city is located. For this, obviously you will search every country, every state for the city Hyderabad and if you got some, you will list the hotels inside that city. Here through this example, the main point that I want to reflect is the order of columns in making composite index. If most of your query needs column1 in their result set, then include it in first order and accordingly the rest.
If you want result set on product basis, then put product column first and if you want the output on location basis, put location column first.
Syntax to create composite index:
USE [OLAP] GO /****** Object: Index [IX_EmployeeDim_Composite] ******/ CREATE NONCLUSTERED INDEX [IX_EmployeeDim_Composite] ON [dbo].[EmployeeDim] ( [Department] ASC, [emp_name] ASC )
- Heaps: If a table does not contain Cluster index or Non-Cluster index, then it is called a heap table. The data is stored in the table as it is inserted. The Query engine has to search each row in the table to get the desired row. It hurts performance to its maximum. Even if you have non-cluster index on a heap table, it does not yield good performance because physically the data is not sorted and it has to search the whole table to get the reference key. Only if you are using covering index where there is no need to refer the table data, it is useful.
Two ways to create Cluster Indexes in T-SQL:
- While creating Table (by CREATE TABLE): You can specify index constraints while creating the table with primary keys. A primary key by default creates Cluster Index in it.
USE [OLAP] GO /****** Object: Table [dbo].[EmployeeDim] ******/ CREATE TABLE [dbo].[EmployeeDim]( [emp_name] [VARCHAR](50) NULL, [emp_id] [INT] NOT NULL, [emp_DOB] [DATETIME] NULL, [Department] [VARCHAR](50) NULL, CONSTRAINT [PK_EmployeeDim] PRIMARY KEY CLUSTERED ( [emp_id] ASC ) )
Now run the query that we have used earlier to get the information about the employee id 132.

Note the execution plan, it seeks the Clustered Index.
- After creation of Table (by CREATE INDEX / Alter command): Even if you have table already created, you can have cluster index on it by Create Index statement or by ALTER TABLE.
By ALTER TABLE:
USE [OLAP] GO /****** Object: Index [PK_EmployeeDim] ******/ ALTER TABLE [dbo].[EmployeeDim] ADD CONSTRAINT [PK_EmployeeDim] PRIMARY KEY CLUSTERED ( [emp_id] ASC )
By Create Index:
USE [OLAP] GO /****** Object: Index [PK_EmployeeDim] ******/ CREATE CLUSTERED INDEX PK_EmployeeDim_Cluster ON dbo.EmployeeDim (emp_id);




Recent Comments