Azure Data Factory: Delete from Azure Blob Storage and Table Storage

NOTE: This blog post relates to the ADF V2 service

When performing data integration, a very common action to take in that process is to remove a file, a row or K/V pair after reading, transforming and loading data. Azure Data Factory is the Azure native ETL Data Integration service to orchestrate these operations. Below are the steps that you can take to achieve this as part of your data pipelines in ADF.

Delete Azure Blog Storage file

  1. Set-up a Logic App in Azure to call the Azure Blob Service REST API DeleteBlob.
  2. In the Logic Apps Designer, add an HTTP Request trigger followed by an Azure Storage Delete Blob action.
  3. la001
  4. You will need one parameter in the Request Body which we’ll use as the “BlobName” to delete. In a future step, we’ll use the ADF pipeline parameter feature to pass in the name of the blob to delete here.
  5. Perform a test run of your Logic App here in the Logic App Designer to make sure that it works properly by entering a value for the BlobName as a test.
  6. Once you are happy that it is working properly and deleting the targeted blob file, save the Logic App.
  7. Copy the HTTP POST URL from the top of the HTTP Request Trigger seen in the screenshot above under Step #3.
  8. Let’s now move into the ADF UI. From the pipeline view, click on an open area in the design surface to show the Pipeline Parameters properties at the bottom.
  9. lab002
  10. We’ll create 2 parameters: PartitionKey and BlobName, both of type String. In my testing, as you can see from the screenshot above in step #9, I used default values. When you enter Debug mode for your pipeline, you will be prompted to enter values for those parameters, so you can always simply enter the strings at that time.
  11. The values need to be the full JSON K/V pair that we’ll send to the Logic App HTTP Request:
    • {"blobname":"mycontainer/emp.txt"} <- That is my container / filename. Use your value there.
    • We’ll use the PartitionKey param later for the Table row delete below.
  12. Place a Web Activity on the design surface
  13. Set the properties under “Settings” similar to what I have below … The URL is the URL you copied from Logic Apps. Method = POST. For Body, enter the BlobName parameter similar to what I have or use the Add Dynamic Content option to enter the parameter.
  14. la003
  15. Click Validate in the ADF UI to ensure your configuration is valid, then hit “Debug”. Enter the parameter string values when prompted for your Blob filename, then click “Finish”.
  16. You should see the results at the bottom under Pipeline Output.

Delete Azure Table Storage Entities

  1. Set-up another Logic App in Azure, this time to call the Azure Table Service REST API DeleteEntity.
  2. In the Logic Apps Designer, add an HTTP Request trigger followed by an Azure Table Service Delete Entity action.
  3. la004
  4. This time, we’ll need 2 parameters in the Request Body: PartID and RowKey. You could also parameterize the Table Name as well if you wish to.
  5. Perform a test run of your Logic App here in the Logic App Designer to make sure that it works properly by entering a value for each of the keys that correspond to an entity in your Table.
  6. Once you are happy that it is working properly and deleting the targeted entities, save the Logic App.
  7. Copy the HTTP POST URL from the top of the HTTP Request Trigger seen in the screenshot above under Step #3.
  8. Back in the ADF UI, from the pipeline view, click on an open area in the design surface to show the Pipeline Parameters properties at the bottom.
  9. lab002
  10. Now we’ll use that PartitionKey ADF parameter you created earlier. You can enter the values here in the Default Values or when you enter Debug mode for your pipeline, you will be prompted, so you can always simply enter the strings at that time.
  11. The values need to be the full JSON K/V pair that we’ll send to the Logic App HTTP Request:
    • { "PartID": "20180309T0000", "RowKey": "data" } <- That is a pointer to my test Table row.
  12. Place a Web Activity on the design surface
  13. Set the properties under “Settings” similar to what I have below … The URL is the URL you copied from Logic Apps. Method = POST. For Body, enter the PartitionKey parameter similar to what I have or use the Add Dynamic Content option to enter the parameter.
  14. lab004
  15. Click Validate in the ADF UI to ensure your configuration is valid, then hit “Debug”. Enter the parameter string values when prompted for your Table row PartitionID and RowKey, then click “Finish”.
  16. You should see the results at the bottom under Pipeline Output.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s