IMPORTANT NOTE: We ended-up fixing this as a bug in ADF and so this pattern should no longer be necessary. Just include headers in your Sink dataset and then ADF will now output the header when zero rows are produced from your data flow. I’m leaving this blog post here in the hopes that it will help you to understand how to build different logical constructs in ADF data flows.
Today, we don’t have an option in data flows in ADF to include headers when your transformations result in zero rows. But you can build the logic to handle this scenario. So, until we add a checkbox feature to include headers, you can use this technique below to achieve this.
A completed pipeline template for this demo can be downloaded here. To add it to your data factory, open the ADF UI and click +New Pipeline Template > Import from Template.
The primary idea is to duplicate your data source and write just the headers if no rows result from your transformations.
In the above example, I have my primary file of movie ratings data on top with a text delimited dataset that includes headers. The bottom Source is a copy of that same dataset, but does not include headers. This way, we can read the entire file as data rows, including treating the header as data.
I use the Surrogate Key transformation to incrementally add a row number to each row in the dataset. This means that row 1 will be the header in the bottom stream. I set the surrogate key column to “rownum”. Now add a Filter transformation that will include only that header row with this formula: rownum == 1.
So now our top stream has the full source data and we now have a bottom stream whose source is just the header labels.
The logic to implement is this: If NumberOfRows < 1 then write the header row only to a text file, else write full transformed dataset with header row as normal.
The top row has a Filter transformation called “Filter1” that I put there just so that you can test this scenario with and without rows. The filter expression uses an integer parameter that you can set to 1 if you wish to test with row data and 0 if you wish to test the zero row header-only scenario. You can remove that filter from your implementation.
Here’s what you need to implement to complete the logic:
- Aggregate. The “rowCount” is an Aggregate with no grouping and just count() for the formula creating a new column called rowcount.
- Conditional Split. Top split is “rowcount == 0” and I called that stream “zeroRows”. When there are rows found, the default path is taken.
- Union. Because we aggregated the data to get a row count, we need to include the original data and the headers back to the primary data stream. The top split unions with the header source and the bottom split unions with the original data.
- Filter. The data is combined with the aggregate row count, so filter out the row count extraneous metadata. Do this with this filter on the top split: “rownum == 1” to pass only the header and “isNull(rowcount)” to pass only the data on the bottom split.
- Sink. The last part is to set the field mappings. You can use a rule here to include all incoming fields, ignoring just the row count and row number metadata. This formula will work on both streams and removes the special columns:
[…] Mark Kromer meets a challenge: […]
[…] ADF: Include Headers in Zero Row Data Flows [UPDATED] It has been a bug, but now you can learn to build mapping data flow in ADF. […]
In cases where am created a business validation error csv file, the sink is creating a file with only the headers. In this case, is it possible to prevent adf from creating a file when there are 0 records for that particular error records sink.
I believe we should have a check box at sink level to prevent the creation of empty file with the headers when the result set is zero rows.
yes we do … I wrote this before we released that feature 🙂