ADF Mapping Data Flows: Create rules to modify column names

The Derived Column transformation in ADF Data Flows is a multi-use transformation. While it is generally used for writing expressions for data transformation, you can also use it for data type casting and you can even modify metadata with it.

In this example, I’m going to demonstrate how to modify column names in the Movies database CSV demo. Note that this technique will duplicate the columns, maintaining the original column names. But you can remove the old fields later in your flow using a Select transformation or Sink-side Mapping.

The sample data for Movies can be found here.

What I want to do is to make two general pattern rules for changing column names:

  1. Any integer field, I want to add a prefix of “int_”
  2. Any string field, let’s add “_trimmed” to the end of the column name

The incoming source data file with Movies info looks like this:

dynsource

Notice that all fields in the projection arrive as strings because this is a text file. I don’t like the space in Rotton Tomato and it is also misspelled. I’ll fix that later. For now, let’s manually cast these to the proper data types. Note that you could also use “Detect Data Type” to auto-detect the types. This is a simple demo with a small file, so I’ll do the work of setting the types in a Derived Column transform that I’ll call “Casting”:

dyncasting

Now that I have all of my source data in the right data type format, I can clean things up a bit. I’ll clean up the spelling in my “FixNames” Select transformation. Use Select for aliasing and removing columns from your data flow.

dynfixnames

The “RenameColumns” Derived Column is where the magic happens. I’m using a column pattern to look for the data types integer and string to apply the business rules that I described at the beginning of this post:

dynrename

My rule states that whenever a column is of type string, trim it and call it field name + ‘_trimmed’. Likewise, for integer columns, we’ll prefix them with ‘int_’, but not modify the data at all.

Derived Column patterns will generate new copies of the original fields. So I added a Select transform after it to remove the original version so that now in my Sink, all I need to do is “auto-map” and I’ll have my new column names.

dynremove

The end result of our data flow looks like this:

dynamiccols1.png

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s