SQL Server Indexes – Pros and Cons (Part 1)
Indexing has become a critical buzzword in the arena of databases. Not only database users, but also all of us are using indexes in our daily life. Got puzzled? Let me give you one scenario. Have you ever noticed how did a librarian arranges and orders all the books in the library? And when you are asking for a particular book, he immediately gives it to you within some couple of minutes without searching the whole library. This is possible because he has already indexed each book and categorized them according to the author and the publisher and again ordered all the books falling under same author and publisher in an alphabetical order of the title. So when you ask for a particular book, he just refers to the shelf belonging to the author and publisher of the book and the book is with you in no time.
Let’s take another common example of maintaining your household components. What you generally do when you are in need of Feviquick glue. I can bet on that you will search the whole house like meJ. But the point here is that you can avoid this irritating search if you would have remembered or noted the place where you had kept the Feviquick glue. Not only this, but for all that is part of our daily activities, like books, clothes, visiting places, etc. things that we use regularly or frequently, we remember them i.e. we keep information about them in our memory (brain). This is what we called in database language as indexing so that next time when we use them, it becomes easier for us to access them. One thing you might have noted here is that I am insisting about frequent and regular things and not about rare or occasional things like you might not want to remember the places where you don’t think you would have any business to do or the strangers in the street to whom you are not expecting to meet again. But if you want to remember also, nobody will resist you as human brains have enough capacity to remember everything that comes throughout its life span. But in database world we have space constraints as well as processing constraints, so we must take wise decisions while indexing. From the above discussion, we concluded that an index is a structure or classification used to find information easier.
Let’s have a look on how SQL Server is maintaining indexes in the databases.
Definition: Indexes are relational data structure created for a table or a view in a database in order to make data easier to search and retrieve when the query is processed. The disk space required to store the index structure is usually less than that required by the table. And the main reason behind are the key values on which the indexing are usually done and not on all the columns.
Let’s have an example to elaborate it in more detail: Consider a table named dbo.EmployeeDim. The details about dbo.EmployeeDim are given below:
Table Name: dbo.EmployeeDim
USE [OLAP] GO /****** Object: Table [dbo].[EmployeeDim] ******/ CREATE TABLE [dbo].[EmployeeDim]( [emp_name] [varchar](50) NULL, [emp_id] [int] NULL, [emp_DOB] [datetime] NULL, [Department] [varchar](50) NULL ) ON [PRIMARY]
Statements to insert data:
Here in this example, we will use information provided by the employee table. You might have noted here that none of the columns are sorted i.e. the information about the employees are not in an ordered fashion. Although this is a small example with limited number of rows, but guess a table like this having tones of records and u have been told to find the DOB (date of birth) and department for the employee id ‘229’ how u will get it then and imagine how much time it will take to scan each row to get the information about the employee id ‘229’. Before going for a deep discussion, let’s have some idea about the terms used in indexing.
- Table Scan: Whenever we are firing a query on a database, the SQL server query optimizer is trying to fetch the best possible way to run the query for better performance. In this process, when the Query Optimizer determines that there are no useful indexes available to produce the output result set, then it will go for table scan. In table scan, it has to scan the whole table row by row in order to get the requested rows. This is very slow and is recommended to avoid them. There are rarely some cases where table scan are faster than index scan i.e. where the table size is less.
- Index Scan: Auxiliary data structure used to accelerate access to data within the database. Here, it will not crawl on the whole table rather it will directly fetch those rows and pages that qualify the requesting query. For this a separate data structure is maintained to keep information about the key values of the table (described below).
Note: In some references, you may find that Table scan is called as Index scan and Index scan is named as Index seek.
The main property of index is that it is in sorted form. For example, as in case of dictionary, all the words are in sorted form, so when you are looking for any word, you are directly jumping to the corresponding page related to the word. In the same way the database engine is directly seeking the demanded rows if the corresponding column is indexed.
In our above example, suppose we want to find the employee name, his DOB and department information for employee id ‘132’. The below query will do your job.
SELECT [emp_name] ,[emp_DOB] ,[Department] FROM [OLAP].[dbo].[EmployeeDim] WHERE [emp_id] = 132
Let’s have a look on the background process that has carried on for this simple query. As of now, there is no index in our table named [EmployeeDim], so the database engine will search each record, row by row, to get the desired result. Below is shown how the database engine performs the operation.
The Execution Plan for the above query shows that 100 % of the total cost is spent for Table Scan.
Now let’s create one index on the column named [emp_id] and see how it is working. Follow the below procedures to create an index on a table:
Step1: Open SQL Server Management Server and navigate to [EmployeeDim] table as below:
Step2: Select the column named [emp_id] and click on “Manage Indexes and Keys” on the toolbar as below:
Step3: On “Indexes/Keys” dialog box, click on Add button and click on the ellipse button shown beside the Columns property.
Step4: Then on “Index Columns” dialog box, select the column ‘emp_id’ in Column Name and click OK button. After that press Close button and save the table.
The above index can be easily created by executing one simple T-SQL query (This is why coders are the best in programming worldJ). The t-sql command to create index is given below:
USE [OLAP] GO /****** Object: Index [IX_EmployeeDim] ******/ CREATE NONCLUSTERED INDEX [IX_EmployeeDim] ON [dbo].[EmployeeDim] ( [emp_id] ASC )
After creation of indexes it is always good to have a look on the newly created index, you can do this by two ways:
- Expand Indexes folder under the table EmployeeDim, you will find one index was created on the name of IX_EmployeeDim.
- Or else you can run the below query to check all the available indexes under a table.
EXEC sp_helpindex 'dbo.EmployeeDim'
Now let’s execute the same query to find out the information about the employee having employee id 132. This time we may see that SQL Server engine is performing its operation in a different way and is shown as below:
A separate data structure is created for the column emp_id in ascending order and has the reference to the original row with the SL_No column (SL_No is taken to describe the example pictorially; you can assume it as the row_id maintained internally for each row by the database engine, it is usually not visible to the user).
Here the SQL server database engine will directly jump to the row 132 in the index structure by using search algorithm, usually it performs B-tree search for faster extraction of data. As soon as it got the row, it will seek the corresponding record in the main table by the reference id (row_id) and will display the rest of the information from the table about the employee having emp_id ‘132’.
As in this example, we have taken a small table, so performance difference is negligible, but in case of big tables with records more than lakhs, indexes plays a vital role in fetching data promptly.