Here comes one more challenging yet interesting topic to tide over. The requirement goes something like this: You have some sources, let it be some sales data or some Call center data coming from different sources. Data can be of different media. But you are able to load it in your staging tables i.e. one staging database you are maintaining like StageDB for storing these incremental data.
So, here we have both master i.e. Dimensional data and Detail i.e. Fact data in our stage database. But before processing further and loading it into our Mart or Data warehouse, we need to check if there is any redundancy at the row level for each of these tables in the staging database. Here comes the main problem, what if you don’t know how many tables are there in the staging database and how many columns are there for each table in the staging database but still you have to keep only unique records for each table.
Read more…
SQL Server 2005 has some unique features to deal with the Transaction system in the database world. It has some unique sets to take care of every possibility of transactions or types of transaction. Technically, it will give us discrete ways to isolate the transactions from occurrence of deadlocks or crashes.
Before going deeper to the Isolation level that SQL Server provides to distinguish types of transaction, let’s have a look on the definition of the TRANSACTION. What does transaction means in real world and in a database scenario?
Read more…
In my previous article named “Programming Foreach Loop Container – Enumerating Excel Files”, we discussed about enumerating Excel files in a folder and load data from each excel file to the destination by using Foreach File enumerator.
But there may be cases where we may have data in the form of sheets in a single excel file. For example, the total sales of study materials over different countries.
To load all the sheets from a single excel file in Foreach Loop Container, we have to use “Foreach ADO.NET Schema Rowset Enumerator”. This special enumerator is used to enumerate through schema information about a data source i.e. the schema information supported by the OLEDB provider. For example, to get list of tables, schemas, column information, constraints, etc from a database.
Read more…
Sometimes we were not aware of the contents of any object and we have to do certain tasks as many times as the no. of contents of that object. For example, a folder contains some text files with employee information data and we have to load all the text files data to our SQL server employee table. Here, we don’t know about the count of the source but we have to roll over every file. For this type of situation, SSIS have a unique container called “Foreach Loop Container” that will enumerate with each text file in the folder.
It can be used in some other cases too like enumerating each sheet of a excel file, or parsing each row of a table, enumerating each child node of a given tree, etc. In SQL server 2000 DTS package, it is bit difficult to loop over any task but fortunately SSIS has a special container to accomplish such tasks.
Read more…
While building an ETL package, sometimes we come across to repeat same types of tasks many times i.e. like to do certain task depending upon the number of students in the class, or any other condition, etc. So instead of developing same containers or tasks so many times, it’s always better to iterate it with certain parameters.
To resolve the above situation, SQL Server Integration Service (SSIS) comes with one special container called “For Loop Container“. This container has the same logic that a “For loop” keyword have in leading programming languages. It will iterate through all the tasks inside it till the condition satisfies. It acts as a repeating control flow. For each iteration of the loop, the For Loop container evaluates an expression that is specified and repeats its workflow until the expression evaluates to False.
Read more…
While building an ETL package, it’s very important to control the flow of work as per the requirement. And this can be done by using a SSIS component called Sequence container. The sequence container is a part of Control Flow and a sub part of the global container i.e. Control Flow container. Here we can set the flow of our package by precedence constraints.
The Sequence Container is basically used for grouping same type of tasks and to control them under a separate group. It can group the whole contents of the package into multiple separate control flows. The sequence container can be nested to any level to give more clarity to the process flow.
Here in this article, we will see the steps to control the tasks using Sequence container as well as how to do the same by programming Integration Services.
Read more…
SQL Server provides us with a powerful Data Integration service called SSIS. But to make it more robust and effective, we need to monitor some of the vital measures so as to increase its performance, troubleshooting errors and to keep track of the data flow. The best way provided by SSIS to accomplish this is SSIS Logging. SSIS will log entries in the log file or table on the basis of the events that occur during the execution of the package.
Here in this section, we will tell you about different features of SSIS logging and the implementation of the same by programming Integration Services.
Read more…
Lookup Transformation is used to lookup data by joining data in input columns with columns of a reference table. The reference dataset can be a cache file, an existing table or view, a new table, or the result of an SQL statement. The lookup is used to access additional information in a related table that is based on values in common columns.
For Example:
Let us have one employee table and one transaction table. The employee table contains information about all the employees. And the transaction table keeps record of the type of transaction done by any employee.
Read more…
There may be some circumstances in a sanctuary where it becomes necessary to keep a leash of foxes and a pack of wolves separate so that they must not fight with each other and make things a mess. Similarly, in the database jungle we do the same things by using some conditional statements in order to keep things distinct. SSIS provides a nice transformation to do the same, i.e. Conditional Split Transformation.
By using this conditional split transformation, the incoming data can be split into multiple outputs as per the conditions specified. It has a default output too that handles all the rows that do not match to any conditions.
Read more…
Lots of things really matter when building an efficient and robust database and its corresponding package for extracting and loading of data. And this matter more when the database size is very big (or huge).
It’s high time when we need some method to optimize the extraction and loading process in the package. This means increasing the performance of SSIS package and which in turn makes your package to be able to retrieve and update the data they want very quickly.
The key component for faster extraction is to put indexes in the target table but it will give you less performance when you load data on it. So when to put indexes and where to put indexes is very important. Insert indexes to the joining columns before starting extraction, especially to those that have data types other than integer like varchar, nvarchar, datetime, etc.
Secondly, the data type chosen for a particular column also plays a significant role in the performance of your SSIS package. Choose data type and size of the columns wisely.
Read more…
Recent Comments