ADE Azure Data Factory Advanced Features
Once you are comfortable building basic ADF pipelines, a set of advanced features lets you handle complex, real-world scenarios — dynamic pipelines that adapt to the data, metadata-driven frameworks that build themselves, and robust error handling that recovers gracefully from failures.
Metadata-Driven Pipeline Framework
Imagine a company that needs to load data from 150 different source tables into ADLS Gen2 every night. Building 150 individual pipelines is impractical. A metadata-driven framework solves this by storing pipeline configuration in a control table and using one generic pipeline that processes all 150 tables.
How It Works
You create a control table in Azure SQL Database that stores configuration for each source:
-- Control table
CREATE TABLE pipeline_config (
config_id INT PRIMARY KEY,
source_type VARCHAR(50), -- 'SQL', 'CSV', 'API'
source_name VARCHAR(100), -- table or file name
destination_path VARCHAR(200), -- target folder in ADLS
load_type VARCHAR(20), -- 'Full' or 'Incremental'
watermark_column VARCHAR(100), -- column for incremental loads
last_watermark DATETIME,
is_active BIT DEFAULT 1
);
The parent pipeline:
- Uses a Lookup Activity to read all active rows from pipeline_config
- Passes the result to a ForEach Activity
- Each iteration calls a generic child pipeline, passing the config row as parameters
- The child pipeline uses those parameters to dynamically set the source, destination, and load type
Adding a new source table now requires only one INSERT into the control table — no new pipeline needed.
Dynamic Expressions and the Expression Builder
ADF expressions allow pipeline properties to be computed dynamically at runtime rather than hardcoded. Expressions use a JavaScript-like syntax called the ADF Expression Language.
Common Dynamic Expressions
// Today's date formatted as a folder path
@formatDateTime(utcNow(), 'yyyy/MM/dd')
// Yesterday's date — for incremental loads
@formatDateTime(addDays(utcNow(), -1), 'yyyy-MM-dd')
// Dynamic file path using pipeline parameters
@concat('bronze/sales/', pipeline().parameters.region, '/',
formatDateTime(utcNow(), 'yyyy/MM/dd'), '/sales.csv')
// Get the output of a previous activity
@activity('Lookup_Watermark').output.firstRow.last_watermark
// Conditional expression
@if(equals(pipeline().parameters.load_type, 'Full'), 'Overwrite', 'Append')
Custom Activities
ADF's built-in activities cover most scenarios. When you need to run custom code that does not fit existing activity types, the Custom Activity runs your .NET or Python code on an Azure Batch pool. Azure Batch is a compute service that runs scripts on a pool of virtual machines. Use Custom Activities for complex data processing that requires specific libraries or executables not available in standard ADF activities.
Data Factory Mapping Data Flows — Advanced Patterns
Schema Drift
Source data does not always arrive with a consistent schema. A vendor might add new columns to their CSV files without warning. Schema Drift allows a Data Flow to handle new, unexpected columns without failing. When Schema Drift is enabled, the Data Flow passes unknown columns through automatically.
Parameterized Data Flows
Data Flows support parameters just like pipelines. A parameterized Data Flow can process different source files and write to different destinations based on the parameters passed to it at runtime. This makes one Data Flow reusable across multiple pipeline runs.
Flowlets
A Flowlet is a reusable sub-graph within a Data Flow — equivalent to a function or subroutine in programming. If the same transformation logic appears in multiple Data Flows (cleaning phone numbers, standardizing addresses), extract it into a Flowlet and reference it instead of duplicating the logic.
Pipeline Templates
ADF provides a Template Gallery with pre-built pipeline templates for common scenarios. You can also publish your own custom templates to a private gallery shared across your organization. Templates dramatically speed up pipeline development for common patterns like incremental load from SQL, copy from S3 to ADLS, or load from REST API.
Managed Virtual Network and Private Endpoints in ADF
ADF's Managed Virtual Network isolates the Azure Integration Runtime inside a private network managed by Microsoft. Private Endpoints connect from this VNet to your data sources — ADLS Gen2, Azure SQL, Synapse — over the Azure backbone without any traffic leaving to the public internet.
Enabling the Managed VNet is a simple checkbox when creating the Azure IR. This is the recommended approach for any production environment where data security and network isolation are required.
ADF REST API and SDK — Programmatic Control
ADF exposes a full REST API and Python SDK. You can trigger pipeline runs, monitor status, create pipelines, and update parameters programmatically. This is useful when integrating ADF with external orchestration systems or building custom monitoring dashboards.
# Trigger an ADF pipeline run using the Python SDK
from azure.identity import DefaultAzureCredential
from azure.mgmt.datafactory import DataFactoryManagementClient
credential = DefaultAzureCredential()
client = DataFactoryManagementClient(credential, subscription_id)
run_response = client.pipelines.create_run(
resource_group_name="my-rg",
factory_name="my-adf",
pipeline_name="Daily_Sales_Load",
parameters={"load_date": "2024-01-16"}
)
print(f"Pipeline run ID: {run_response.run_id}")
Key Points
- Use a metadata-driven framework to manage large numbers of source tables through a single generic pipeline controlled by a configuration table
- Master ADF dynamic expressions — they turn hardcoded pipelines into flexible, reusable frameworks
- Enable Schema Drift on Data Flows that read from sources with evolving schemas
- Extract repeated transformation logic into Flowlets to avoid duplication across Data Flows
- Enable the Managed VNet on the Azure IR for private, secure connectivity in production environments
- Use the ADF Python SDK to integrate pipeline triggering into external orchestration systems
