SSIS vs Text File Importing – II
In my previous post named “<<SSIS vs Text File Importing –I>>”, we discussed about the problem statement for the strange behavior of SSIS over importing of Text Files. In this post, I will disclose the reason behind it. Let’s see how Excel and SSIS treats text files while importing data from it.
For Excel: While importing text files, Excel parses lines or Row Delimiter first and then come to Column Delimiter to parse for the cell values.
It will first traverse for the Row Delimiter and find out where the record / line ends, and for that line it will parse for the available columns and if it didn’t get any, it will replace with NULL values. So even if some of the columns don’t have values in it, it will be properly aligned.
For SSIS: But SSIS loads text file little differently, it parses Column Delimiters first and then parses Row Delimiter.
With good set of Data: It will parse all column values till the count of the columns come to end. The column information is gathered from column metadata which got set while configuring the text file info. After fetching all the columns, it will look for the ROW Delimiter.
With bad set of Data: Here it will parse all columns first and then rows. Have a look on the flow of control.
In the above picture, we can see the flow of data. Till second row everything is fine, but at third row, after second column there is no data, but SSIS will still look for the column value even if it found a ROW Delimiter ahead. It will treat the row delimiter as a part of column data and start inserting value from the 4th row till all the columns got filled for the third row. When it reaches the last column, it will start searching for Row delimiter even if it found some column delimiters ahead, it will treat all those as part of last column value for the 3rd row.
It will continue doing that till it found the row delimiter, which is actually the row delimiter for the fourth record. Eventually, the fourth record got merged in the third record. The same thing happens for 5th and 6th record. The 6th record got merged into 5th one.
Solution: As we can see from above that while inserting the text file named “CarInfowithMissingData.txt”, it got loaded properly even if there are NULL values for some columns. But the data in the text file was aligned properly with correct set of tabs. So, the key point to this is if somehow we can reformat the text files with proper number of tabs (for NULL values), before loading it into SQL server database through SSIS, then the problem can be solved.
One more point to be noted here that Excel application can open the file correctly even if the tabs are not aligned properly. Hence, the workaround is to take the help of excel to reformat the text files before ETL operation in SSIS. Just open the text file in excel and save it again. It will create the missing tabs for all NULL value columns.
This can be done manually, but what if the numbers of text files are more. So we can go for a Script Task like below to accomplish this job.
Script Task Code in VB .Net Expand
Follow the below link to see step by step loading of text files with missing values and missing tabs.
<<< SSIS vs Text File Importing –III >>




Good article!
Thanks.
Greate
Now i see how it done.
Thanks.
Great Article
I had a similar problem. I set the “Retain null values form the source as null values in the data flow” in the Flat File Source Editors Conneciton Manager property and that solved my problem. I was getting blanks and 0 for some of the rows for some of the columns. I got the data converted from a .xls file to csv format (Comma DElimited, not the MS-DOS version).
SQL 2008 R2
VS 2008
64-bit OS.