Dynamic File Names in ADF with Mapping Data Flows

If you are using ADF to process files in Azure and wish to generate new output files based on values in your data, you can accomplish this with built-in capabilities found in ADF’s Mapping Data Flows.

The key is to use a dataset in your Sink transformation that is a Delimited Text (Parquet will work as well, depending on the data format you choose) with only a folder location. Do not set a file name for the output. We’ll set it dynamically inside your data flow:

dyn8

dyn4

The flow will be Source transformations > Filter (we’ll filter only certain rows for this sample) > Derived Column (this is where we’ll set the target file name) > Sink

dyn1

In my sample, I’m filtering only movies from 1940 with rating of 6.

dyn2

Use a Derived Column to set the string value that you’d like to use for your target file name. In my case, I’m calling my column simply “filename” and I’m setting it to the string literal ‘movies-out-‘ and appending today’s date. I’m then adding ‘.csv’ as I chose Delimited Text with comma delimiter in my sink dataset:

dyn3

There is an ADF system attribute in the Sink that allows you to set the output filename based on a value called “Column with file name”:

dyn5

Under Settings in the Sink transformation, choose “As data in column” and then pick the field “filename” which we created previously with the Derived Column transformation. This will set a single file output as “movies-out-{date}.csv”.

Go into the pipeline with Debug and execute this data flow. This is the file output result that will get dropped into the folder defined in my dataset:

dyn7

You can see how we have a dynamic filename with only the filtered rows that we asked for in the ADF Data Flow.

Here is a direct link to the JSON for this data flow: https://github.com/kromerm/marksadfrepo/blob/gaversion/dataflow/DynamicFileName.json

Advertisement

3 comments

  1. I was trying to output to a single file so I could control the file name, which meant I could only use a single partition and had to go back to Cosmos DB for each new file name. This saved me tons of execution time and resources against Cosmos DB. Thanks for this!

  2. Great article, thanks for that!

    I am really wondering though: is there any documentation of this anywhere? I feel I am spending a huge amount of time googling the stuff around the web and hoping that somebody mentions it somewhere…

    Btw: it is great that the “filename” Derived Column is not present in the result files…. But it is not obvious why? Because it is used as filename column? It is just weird…

    Thanks anyway, it helped me a lot!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s