Truncation is what happens when you try to fit a string of one length into a field that holds a string of a shorter length. In most cases the last characters of your string are cut off so it will fit in the field. Of course, in the case of SSIS your package will error to a grinding halt if truncation is not explicitly handled. We see truncation very often in fields such as first name, last name, address, and so forth when moving data from one platform to another. While there are many ways to handle truncation, I’m going to discuss some of the more common I’ve seen here along with their plusses and minuses.
Approach One: Use Substring() ,or equivalent function, in your SQL statement to pull out only as many characters as you can handle in the destination. In the example below, we are using the Mid() function since we are pulling from an Access 2000 database. The FirstName and LastName fields in the Access database are of length 20 but our destination has those same fields of length 10.
The pros of this approach is that it is quick and easy and keeps your overall dataflow clean. You don’t have to worry about getting possible truncation errors during different steps of your dataflow. All of the “mess” is contained in that one nice and neat little SQL statement.
The cons of the approach are that you will lose data. You are forcibly truncating strings, after all. If someone has a long last name, that name is going to come over with the last few letters missing.
Approach Two: Use a Data Conversion Task to shorten your strings en route to the destination. You can either ignore the truncation by configuring so in the Error Output, or you can redirect strings that would be truncated to another table.
The pros of this approach is it gives you the opportunity to redirect any rows that will be truncated to another table or flat-file for later analysis. This may be very important for some data where you can’t lose anything.
The con of this approach is it makes your data flow a bit messier. You will have now the “copy of firstname” and “copy of lastname” columns as shown in the screenshot below and the original firstname and lastname columns. You can remove unwanted columns by adding a Sort task and not letting them pass through, but again, it’s a bit messy.
Approach Three: Attempt to use a script component to shorten the data. Consider the following code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
Notice that we are attempting to replace certain words commonly found in addresses with their abbreviations. This approach is commonly used with approach two so as rows get redirected, more words can be found to be abbreviated thereby cutting down on error.
The pros of this approach is that it can certainly cut down on truncation in a more acceptable way.
The cons of this approach is that it will not work for all data and it can be hard to maintain as you are constantly adding cases to your switch statement or more “if’s” in the case of this example.
Truncation is one of those things that you are guaranteed to deal with when working with data. Some vendors are certainly more generous with space in their database than others. Why do some vendors make some string fields so small? Who knows. But at least we do have a few options on how to deal with it. Maybe not perfect options, but options nonetheless.
If anyone has any cool approaches to how they handled truncation, I look forward to reading about them in the comments section or via links to your own blogs.