Home > SSIS programming > SSIS Programming Basic

SSIS Programming Basic


Minimized costs, increased effectiveness, higher flexibility and easy management have become keys to success in the IT industry today. This page aims at achieving the above by introducing a new approach to solve a set of problems faced in ETL process.

The IT industry is ever expanding, and elements of IT can be found in almost every sector, every company. Whether you are playing on a website like (www.casino.com/) or using advanced computerized utilities on a farm, knowledge of IT and programming is going to be very useful. Computer skills are necessary all over the world these days, from basic knowledge to complex procedures. Processing a set of operations with different source and target names, but same structure and same set of transformations is a common scenario in development areas. So approaching an automated methodology for the same using some programming languages like C# .Net or VB .Net with little or no modifications is better. This can reduce the total work load up to 80 – 90%.

This page deals with the approach of creating the ETL Package by using C# .Net and VB .Net language. As SQL Server 2005 is a product of Microsoft Corporation, we got the whole set of APIs in Microsoft Visual Studio. Using these APIs and some standard procedures and scripts in T-SQL, we can ease the ETL Process to increase the productivity.




SQL Server Integration Services (SSIS) has an architecture that separates task management (control flow) and data movement and transformation (data flow). Programming Integration Services can be used very efficiently and effectively to extend and automate the purpose of SSIS i.e. the ETL process. We have two engines for both types of operations.

  • Run-Time Engine: The run-time engine implements the control flow and package management infrastructure i.e. for logging, precedence constraints, package configuration, event handlers, and variables. By programming the run-time engine, developers can automate the creation, configuration, logging, precedence constraints and execution of packages and create custom tasks and other extensions.
  • Data Flow Engine: The data flow engine is specialized for extracting, transforming, and loading data. The data flow task contains additional objects called data flow components like Data Flow Source components, Data Flow Transformation components and Data Flow Destination components. Programming can lead to automation in the creation of Data Flow components and as well creation of custom components.
    Both the run-time engine and the data flow engine are written in native code. As a result they are available through a fully managed object model.

Commonly used Assemblies for programming Integration Services

Assembly

Description

Microsoft.SqlServer.ManagedDTS.dll

Contains the managed run-time engine.

Microsoft.SqlServer.RuntimeWrapper.dll

Contains the primary interop assembly (PIA), or wrapper, for the native run-time engine.

Microsoft.SqlServer.PipelineHost.dll

Contains the managed data flow engine.

Microsoft.SqlServer.PipelineWrapper.dll

Contains the primary interop assembly (PIA), or wrapper, for the native data flow engine.

(http://msdn.microsoft.com/en-us/library/ms403344.aspx)

Below are the basic steps for Programming Integration Services:

ssisprog1

Below you will see different approaches taking care of different tasks and transformations commonly used in building SSIS package.

EXAMPLE:

Here in this example, the data from source database named OLTP will be loaded to the destination database OLAP. In the OLTP database, the table named Employee_Dim will be transferred to the OLAP database. And during this transformation, the package log will be tracked in PackageTransaction table in the OLAP database.

Details about PackageTransaction table:

Column_name

Type

TransactionID

int

PackageName

varchar

RunDate

datetime

RowCount

int

Status

bit

Details about Employee_Dim table:

Column_name

Type

employee_Sl_No

int

emp_name

varchar

emp_id

int

emp_DOB

datetime

We have the below data in Employee_Dim table:

employee_Sl_No

emp_name

emp_id

emp_DOB

1

Arun

101

1984-03-16 00:00:00.000

2

Akash

102

1982-01-31 00:00:00.000

3

Ram

103

1986-03-03 00:00:00.000

4

Rasi

104

1986-04-25 00:00:00.000

The package structure will be like this:

ssispro1

The variables used for the package are:

ssispro2

Before the loading of data starts, this particular run of the package in PackageTransaction table will be logged. Here the column named TransactionID in PackageTransaction table is an IDENTITY Column, so its value will be generated automatically.

The procedure used to do this is:

CREATE PROCEDURE InitTransaction

AS

BEGIN

insert into dbo.PackageTransaction

( PackageName,

RunDate,

Status

)

values

( ’sample package’,

getdate(),

0

)

SELECT

CAST(Scope_Identity() AS INT) TransactionID

END

GO

The above procedure will return the Transaction ID to the package, so that it can be used while ending the package execution. The Status column indicates the successful completion of the package, 0 means not completed and 1 means successfully completed.

Similarly, we need another procedure to log the completion of the package in PackageTransaction table.

CREATE PROCEDURE [dbo].[EndTransaction]

@TransactionID [int],

@RowCount [int]

AS

BEGIN

UPDATE dbo.PackageTransaction

SET

[RowCount] = @RowCount,

Status = 1

WHERE TransactionID = @TransactionID

END

After executing the package, we will get the below result in the PackageTransaction table.

TransactionID

PackageName

RunDate

RowCount

1

sample package

2009-05-01 08:55:52.467

4

Let’s do the same by Integration programming. For this we have to create a package first, and then have to add the tasks and the precedence constraints accordingly, and then have to add the package variable. In the dataflow task, we have to add the data flow source and data flow destination components.




The code below shows the above package programmatically and the code itself is self-descriptive. The code for SQL server 2005 and SQL Server 2008 are a bit different; both the approaches are given below.

Add the below References before developing the code.

Microsoft.SqlServer.Dts.Design
Microsoft.SqlServer.DTSPipelineWrap
Microsoft.SQLServer.DTSRuntimeWrap
Microsoft.SQLServer.ManagedDTS

Programming Integration Service in SQL Server 2005

Expand image C# .Net Expand
Expand image VB .Net Expand

Programming Integration Service in SQL Server 2008

Expand image C# .Net Expand
Expand image VB .Net Expand

Like the above we can use any transformation and task in building a package programmatically.

To make your work more easy and effortless, SQL Lion team comes up with a unique tool named “SSIS Component Explorer” that will help you in retrieving the Creation Name, Component Class ID, etc about any component like Control Flow Tasks, Data Flow Transformations, Connections Managers, Log Providers, etc in SSIS package. It also provides code snippets for each component.
Download
the free version from here.
Categories: SSIS programming
  1. August 31st, 2010 at 19:44 | #1

    Hi. I’m trying to test your code, but the debugger raise an exception on line
    PackageStartIndicatorTaskdoc.LoadXml(@”");

    The error is:
    System.Xml.XmlException was unhandled
    Root element is missing.

    Bye

  1. No trackbacks yet.