Blog

Copy Azure Blobs with Data Factory Event Trigger

By Mike Veazie
Data Factory, ETL, Azure, Blob Storage

Copy Azure Blobs with Data Factory Event Trigger

 

NOTE: This article applies to version 2 of Data Factory. The integration described in this article depends on Azure Event Grid. Make sure that your subscription is registered with the Event Grid resource provider. For more info, see Resource providers and types.

Overview

This article builds on the concepts described in Copy Activity in Azure Data Factory. We will be using an event trigger to copy blobs between Azure Storage accounts.

There are scenarios where you need to copy blobs between storage accounts. It could be for another layer of redundancy, or simple to move to a lower tiered storage account for cost optimization. In this article, we will show you how to copy blobs immediately after they are created using Data Factory event triggers.

Prerequisites

1.) Azure Data Factory version
2.) Azure storage accounts for source and destination. Each storage has a container called backups
3.) Event Grid enabled as a resource provider on your Azure subscription

This is what our resource group looks like:
NOTE: The storage accounts are in the same location to prevent egress charges

Resource Group

Create Linked Storage Accounts

NOTE: Datasets and linked services are described in depth at Datasets and linked services in Azure Data Factory.

In this demo we will use two storage accounts named dfcopyeventsource and dfcopyeventdestination.

Create linked services for both storage accounts in Data Factory under the connections tab. For simplicity we will just use the storage account account key for authentication. We will just add a _ls suffix to these so we they are linked services later.

Create linked service menu

Select "Use account key" for authentication method

Create linked storage

Both storage accounts are now linked.

Create linked storage

Create Datasets

Now we will create a Dataset for each storage account with _ds as a suffix. For simplicity, use binary copy. Be sure to add the path backups.

Create Dataset Storage

Create Dataset Storage Name

Publish the Datasets

Publish Datasets

Create Pipeline

Now we will create a Pipeline and add the Copy Data activity. Our pipeline is called copy_blob_pipeline.

Create Pipeline

Drag the Copy Data activity onto the canvas.
1.) Name it Copy Data Activity
2.) Select source Dataset
3.) Select destination Dataset (sink)

Copy Data Copy Data Sink

Testing Pipeline

Publish all resources and your Pipeline should be ready to test. We will add event triggers after confirming the copy data activity works as expected.

Take the following steps to test the Pipeline

1.) Upload test data to source storage account
2.) In Data Factory, click the Debug button
3.) Verify the pipeline finishes without error and the data was moved to the destination

Before

Debug Results

After

Adding Event Trigger

We will be adding a Blob Created event trigger to the copy_blob_pipeline. It will watch the source storage account under the backups container for new blobs, and then copy only the new blobs to the destination storage account. To accomplish this, we need to parameterize our Pipeline and source Dataset. The event trigger will inject information about the blob into our parameters at runtime.

Parameterize Pipeline

Edit the copy_blob_pipeline and add two parameters:
1.) sourceFolder
2.) sourceFile

Pipeline Parameters

Add Event Trigger with Parameters

We'll create the event trigger and configure it to inject the new blob sourceFolder and sourceFile into the pipeline parameters we just created.

Trigger Trigger Pamaeters

Parameterize Source Dataset

The final step is to parameterize the source Dataset dfcopyeventsource_ds. On the Connection tab, add @pipeline().parameters.sourceFolder and @pipeline().parameters.sourceFile to their respective input boxes.

Pipeline Parameters

Testing Trigger

Publish All and the Event Trigger will be live and watching the source storage account. Upload a file to the source and then view the monitoring tab to validate the pipeline executed without issues. If all is configured correctly, the blob will be copied to the destination storage account.

Test Trigger

Test Trigger


Mike Veazie

Mike is self motivated, goal oriented, and has a passion for learning. He loves being outside his comfort zone, and being involved on challenging projects.


comments powered by Disqus
Company Info

SkyNorth Software LLC
5357 Penn Ave South
Minneapolis, MN 55419

Follow Us

Get the latest news and updates.


© SkyNorth Software LLC