Handling Truncation in SSIS

Truncation is what happens when you try to fit a string of one length into a field that holds a string of a shorter length.  In most cases the last characters of your string are cut off so it will fit in the field.  Of course, in the case of SSIS your package will error to a grinding halt if truncation is not explicitly handled.  We see truncation very often in fields such as first name, last name, address, and so forth when moving data from one platform to another.  While there are many ways to handle truncation, I’m going to discuss some of the more common I’ve seen here along with their plusses and minuses.

Approach One:  Use Substring() ,or equivalent function, in your SQL statement to pull out only as many characters as you can handle in the destination.  In the example below, we are using the Mid() function since we are pulling from an Access 2000 database.  The FirstName and LastName fields in the Access database are of length 20 but our destination has those same fields of length 10. 

The pros of this approach is that it is quick and easy and keeps your overall dataflow clean.  You don’t have to worry about getting possible truncation errors during different steps of your dataflow.  All of the “mess” is contained in that one nice and neat little SQL statement. 

The cons of the approach are that you will lose data.  You are forcibly truncating strings, after all.  If someone has a long last name, that name is going to come over with the last few letters missing. 

image

Approach Two:  Use a Data Conversion Task to shorten your strings en route to the destination.  You can either ignore the truncation by configuring so in the Error Output, or you can redirect strings that would be truncated to another table. 

The pros of this approach is it gives you the opportunity to redirect any rows that will be truncated to another table or flat-file for later analysis.  This may be very important for some data where you can’t lose anything.

The con of this approach is it makes your data flow a bit messier.  You will have now the “copy of firstname” and “copy of lastname” columns as shown in the screenshot below and the original firstname and lastname columns.  You can remove unwanted columns by adding a Sort task and not letting them pass through, but again, it’s a bit messy.

image

image

Approach Three:  Attempt to use a script component to shorten the data.  Consider the  following code:

public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (Row.AddressOne.Contains("Avenue"))
        {
            Row.AddressOne.Replace("Avenue", "Ave");
        }
        if (Row.AddressTwo.Contains("Suite"))
        {
            Row.AddressTwo.Replace("Suite", "Ste");
        }
    }

Notice that we are attempting to replace certain words commonly found in addresses with their abbreviations.  This approach is commonly used with approach two so as rows get redirected, more words can be found to be abbreviated thereby cutting down on error.

The pros of this approach is that it can certainly cut down on truncation in a more acceptable way.

The cons of this approach is that it will not work for all data and it can be hard to maintain as you are constantly adding cases to your switch statement or more “if’s” in the case of this example.

Truncation is one of those things that you are guaranteed to deal with when working with data.  Some vendors are certainly more generous with space in their database than others.  Why do some vendors make some string fields so small?  Who knows.  But at least we do have a few options on how to deal with it.  Maybe not perfect options, but options nonetheless.

If anyone has any cool approaches to how they handled truncation, I look forward to reading about them in the comments section or via links to your own blogs.

JamesNT

This entry was posted in SQL Server, SSIS. Bookmark the permalink.

3 Responses to Handling Truncation in SSIS

  1. Corendiel says:

    You should add some performance indicator in those two solutions. Substring vs ignoring trunc error.
    The result will be the same but performance might be different.

  2. JamesNT says:

    Excellent point. I may consider that for a future post.

    JamesNT

  3. Gregg says:

    Is it just me, or do the truncation settings work differently (or not work at all) for number data types?

    I have a CSV file with the number 1.127 in it.
    In the source file connection manager, I set its DataType to DT_DECIMAL and its DataScale to 2.
    In the source editor’s Error Output tab, I set its Truncation failure mode to “Fail component”.

    So I would expect that trying to load 1.127 into a field with DataScale = 2 would fail.

    Yet when I load the file, it does not fail, and it loads the value 1.12 (truncates the 3rd decimal place) into my database.
    I don’t even get any warnings – it just truncates it silently.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>