The Derived Column transformation in ADF Data Flows is a multi-use transformation. While it is generally used for writing expressions for data transformation, you can also use it for data type casting and you can even modify metadata with it.
In this example, I’m going to demonstrate how to modify column names in the Movies database CSV demo. Note that this technique will duplicate the columns, maintaining the original column names. But you can remove the old fields later in your flow using a Select transformation or Sink-side Mapping.
What I want to do is to make two general pattern rules for changing column names:
- Any integer field, I want to add a prefix of “int_”
- Any string field, let’s add “_trimmed” to the end of the column name
The incoming source data file with Movies info looks like this:
Notice that all fields in the projection arrive as strings because this is a text file. I don’t like the space in Rotton Tomato and it is also misspelled. I’ll fix that later. For now, let’s manually cast these to the proper data types. Note that you could also use “Detect Data Type” to auto-detect the types. This is a simple demo with a small file, so I’ll do the work of setting the types in a Derived Column transform that I’ll call “Casting”:
Now that I have all of my source data in the right data type format, I can clean things up a bit. I’ll clean up the spelling in my “FixNames” Select transformation. Use Select for aliasing and removing columns from your data flow.
The “RenameColumns” Derived Column is where the magic happens. I’m using a column pattern to look for the data types integer and string to apply the business rules that I described at the beginning of this post:
My rule states that whenever a column is of type string, trim it and call it field name + ‘_trimmed’. Likewise, for integer columns, we’ll prefix them with ‘int_’, but not modify the data at all.
Derived Column patterns will generate new copies of the original fields. So I added a Select transform after it to remove the original version so that now in my Sink, all I need to do is “auto-map” and I’ll have my new column names.
The end result of our data flow looks like this: