NOTE: This article applies to version 2 of Data Factory. The integration described in this article depends on Azure Event Grid. Make sure that your subscription is registered with the Event Grid resource provider. For more info, see Resource providers and types.
There are scenarios where you need to copy blobs between storage accounts. It could be for another layer of redundancy, or simple to move to a lower tiered storage account for cost optimization. In this article, we will show you how to copy blobs immediately after they are created using Data Factory event triggers.
1.) Azure Data Factory version
2.) Azure storage accounts for source and destination. Each storage has a container called
3.) Event Grid enabled as a resource provider on your Azure subscription
This is what our resource group looks like:
NOTE: The storage accounts are in the same location to prevent egress charges
Create Linked Storage Accounts
NOTE: Datasets and linked services are described in depth at Datasets and linked services in Azure Data Factory.
In this demo we will use two storage accounts named
Create linked services for both storage accounts in Data Factory under the connections tab. For simplicity we will just use the storage account account key for authentication. We will just add a
_ls suffix to these so we they are linked services later.
Select "Use account key" for authentication method
Both storage accounts are now linked.
Now we will create a Dataset for each storage account with
_ds as a suffix. For simplicity, use binary copy. Be sure to add the path
Publish the Datasets
Now we will create a Pipeline and add the Copy Data activity. Our pipeline is called
Copy Data activity onto the canvas.
1.) Name it Copy Data Activity
2.) Select source Dataset
3.) Select destination Dataset (sink)
Publish all resources and your Pipeline should be ready to test. We will add event triggers after confirming the copy data activity works as expected.
Take the following steps to test the Pipeline
1.) Upload test data to source storage account
2.) In Data Factory, click the
3.) Verify the pipeline finishes without error and the data was moved to the destination
Adding Event Trigger
We will be adding a
Blob Created event trigger to the
copy_blob_pipeline. It will watch the source storage account under the
backups container for new blobs, and then copy only the new blobs to the destination storage account. To accomplish this, we need to parameterize our Pipeline and source Dataset. The event trigger will inject information about the blob into our parameters at runtime.
copy_blob_pipeline and add two parameters:
Add Event Trigger with Parameters
We'll create the event trigger and configure it to inject the new blob
sourceFile into the pipeline parameters we just created.
Parameterize Source Dataset
The final step is to parameterize the source Dataset
dfcopyeventsource_ds. On the Connection tab, add
@pipeline().parameters.sourceFile to their respective input boxes.
Publish All and the Event Trigger will be live and watching the source storage account. Upload a file to the source and then view the monitoring tab to validate the pipeline executed without issues. If all is configured correctly, the blob will be copied to the destination storage account.