NOTE: This blog post relates to the ADF V2 service
When performing data integration, a very common action to take in that process is to remove a file, a row or K/V pair after reading, transforming and loading data. Azure Data Factory is the Azure native ETL Data Integration service to orchestrate these operations. Below are the steps that you can take to achieve this as part of your data pipelines in ADF.
Delete Azure Blog Storage file
- Set-up a Logic App in Azure to call the Azure Blob Service REST API DeleteBlob.
- In the Logic Apps Designer, add an HTTP Request trigger followed by an Azure Storage Delete Blob action.
- You will need one parameter in the Request Body which we’ll use as the “BlobName” to delete. In a future step, we’ll use the ADF pipeline parameter feature to pass in the name of the blob to delete here.
- Perform a test run of your Logic App here in the Logic App Designer to make sure that it works properly by entering a value for the BlobName as a test.
- Once you are happy that it is working properly and deleting the targeted blob file, save the Logic App.
- Copy the HTTP POST URL from the top of the HTTP Request Trigger seen in the screenshot above under Step #3.
- Let’s now move into the ADF UI. From the pipeline view, click on an open area in the design surface to show the Pipeline Parameters properties at the bottom.
- We’ll create 2 parameters: PartitionKey and BlobName, both of type String. In my testing, as you can see from the screenshot above in step #9, I used default values. When you enter Debug mode for your pipeline, you will be prompted to enter values for those parameters, so you can always simply enter the strings at that time.
- The values need to be the full JSON K/V pair that we’ll send to the Logic App HTTP Request:
-
{"blobname":"mycontainer/emp.txt"} <- That is my container / filename. Use your value there.
- We’ll use the PartitionKey param later for the Table row delete below.
-
- Place a Web Activity on the design surface
- Set the properties under “Settings” similar to what I have below … The URL is the URL you copied from Logic Apps. Method = POST. For Body, enter the BlobName parameter similar to what I have or use the Add Dynamic Content option to enter the parameter.
- Click Validate in the ADF UI to ensure your configuration is valid, then hit “Debug”. Enter the parameter string values when prompted for your Blob filename, then click “Finish”.
- You should see the results at the bottom under Pipeline Output.
Delete Azure Table Storage Entities
- Set-up another Logic App in Azure, this time to call the Azure Table Service REST API DeleteEntity.
- In the Logic Apps Designer, add an HTTP Request trigger followed by an Azure Table Service Delete Entity action.
- This time, we’ll need 2 parameters in the Request Body: PartID and RowKey. You could also parameterize the Table Name as well if you wish to.
- Perform a test run of your Logic App here in the Logic App Designer to make sure that it works properly by entering a value for each of the keys that correspond to an entity in your Table.
- Once you are happy that it is working properly and deleting the targeted entities, save the Logic App.
- Copy the HTTP POST URL from the top of the HTTP Request Trigger seen in the screenshot above under Step #3.
- Back in the ADF UI, from the pipeline view, click on an open area in the design surface to show the Pipeline Parameters properties at the bottom.
- Now we’ll use that PartitionKey ADF parameter you created earlier. You can enter the values here in the Default Values or when you enter Debug mode for your pipeline, you will be prompted, so you can always simply enter the strings at that time.
- The values need to be the full JSON K/V pair that we’ll send to the Logic App HTTP Request:
-
{ "PartID": "20180309T0000", "RowKey": "data" } <- That is a pointer to my test Table row.
-
- Place a Web Activity on the design surface
- Set the properties under “Settings” similar to what I have below … The URL is the URL you copied from Logic Apps. Method = POST. For Body, enter the PartitionKey parameter similar to what I have or use the Add Dynamic Content option to enter the parameter.
- Click Validate in the ADF UI to ensure your configuration is valid, then hit “Debug”. Enter the parameter string values when prompted for your Table row PartitionID and RowKey, then click “Finish”.
- You should see the results at the bottom under Pipeline Output.
Can you please tell me how to pass blob name from a pipeline parameter thats coming from a trigger
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
[…] the web, I’ve found several alternatives, of which using Logic Apps seemed the most convenient workaround. However, I was not satisfied, as this solution introduces […]
Hi, I followed the same procedure that you have mentioned, however i am still unable to delete the blob.
My requirement is
Blob/container/folder1/all files….
After a particular activity is completed i need to delete all the files and the folder but not the container.
May i know how i can achieve this??
[…] des fichiers. Pour ce faire nous avons décidé d’utiliser une Logic App, en suivant cette stratégie, appliquée sur un File Store (voir Stockage ci-dessous). En alternative, nous avons essayé […]
i’m sorry for noob question but i dont get task 5.
“5. Perform a test run of your Logic App here in the Logic App Designer to make sure that it works properly by entering a value for the BlobName as a test.”
How do you do that?
Or more precisely maybe, how do you give the parameter a value or default value?
You can just use the new Delete activity instead now 🙂 https://docs.microsoft.com/en-us/azure/data-factory/delete-activity
[…] But before February 2019, there was no Delete activity. We had to write an Azure Function or use a Logic App called by a Web Activity in order to delete a file. I imagine every person who started working with Data Factory had to go […]