ADE Azure Data Factory Advanced Features

Once you are comfortable building basic ADF pipelines, a set of advanced features lets you handle complex, real-world scenarios — dynamic pipelines that adapt to the data, metadata-driven frameworks that build themselves, and robust error handling that recovers gracefully from failures.

Metadata-Driven Pipeline Framework

Imagine a company that needs to load data from 150 different source tables into ADLS Gen2 every night. Building 150 individual pipelines is impractical. A metadata-driven framework solves this by storing pipeline configuration in a control table and using one generic pipeline that processes all 150 tables.

How It Works

You create a control table in Azure SQL Database that stores configuration for each source:

-- Control table
CREATE TABLE pipeline_config (
    config_id        INT PRIMARY KEY,
    source_type      VARCHAR(50),       -- 'SQL', 'CSV', 'API'
    source_name      VARCHAR(100),      -- table or file name
    destination_path VARCHAR(200),      -- target folder in ADLS
    load_type        VARCHAR(20),       -- 'Full' or 'Incremental'
    watermark_column VARCHAR(100),      -- column for incremental loads
    last_watermark   DATETIME,
    is_active        BIT DEFAULT 1
);

The parent pipeline:

Uses a Lookup Activity to read all active rows from pipeline_config
Passes the result to a ForEach Activity
Each iteration calls a generic child pipeline, passing the config row as parameters
The child pipeline uses those parameters to dynamically set the source, destination, and load type

Adding a new source table now requires only one INSERT into the control table — no new pipeline needed.

Dynamic Expressions and the Expression Builder

ADF expressions allow pipeline properties to be computed dynamically at runtime rather than hardcoded. Expressions use a JavaScript-like syntax called the ADF Expression Language.

Common Dynamic Expressions

// Today's date formatted as a folder path
@formatDateTime(utcNow(), 'yyyy/MM/dd')

// Yesterday's date — for incremental loads
@formatDateTime(addDays(utcNow(), -1), 'yyyy-MM-dd')

// Dynamic file path using pipeline parameters
@concat('bronze/sales/', pipeline().parameters.region, '/', 
        formatDateTime(utcNow(), 'yyyy/MM/dd'), '/sales.csv')

// Get the output of a previous activity
@activity('Lookup_Watermark').output.firstRow.last_watermark

// Conditional expression
@if(equals(pipeline().parameters.load_type, 'Full'), 'Overwrite', 'Append')

Custom Activities

ADF's built-in activities cover most scenarios. When you need to run custom code that does not fit existing activity types, the Custom Activity runs your .NET or Python code on an Azure Batch pool. Azure Batch is a compute service that runs scripts on a pool of virtual machines. Use Custom Activities for complex data processing that requires specific libraries or executables not available in standard ADF activities.

Data Factory Mapping Data Flows — Advanced Patterns

Schema Drift

Source data does not always arrive with a consistent schema. A vendor might add new columns to their CSV files without warning. Schema Drift allows a Data Flow to handle new, unexpected columns without failing. When Schema Drift is enabled, the Data Flow passes unknown columns through automatically.

Parameterized Data Flows

Data Flows support parameters just like pipelines. A parameterized Data Flow can process different source files and write to different destinations based on the parameters passed to it at runtime. This makes one Data Flow reusable across multiple pipeline runs.

Flowlets

A Flowlet is a reusable sub-graph within a Data Flow — equivalent to a function or subroutine in programming. If the same transformation logic appears in multiple Data Flows (cleaning phone numbers, standardizing addresses), extract it into a Flowlet and reference it instead of duplicating the logic.

Pipeline Templates

ADF provides a Template Gallery with pre-built pipeline templates for common scenarios. You can also publish your own custom templates to a private gallery shared across your organization. Templates dramatically speed up pipeline development for common patterns like incremental load from SQL, copy from S3 to ADLS, or load from REST API.

Managed Virtual Network and Private Endpoints in ADF

ADF's Managed Virtual Network isolates the Azure Integration Runtime inside a private network managed by Microsoft. Private Endpoints connect from this VNet to your data sources — ADLS Gen2, Azure SQL, Synapse — over the Azure backbone without any traffic leaving to the public internet.

Enabling the Managed VNet is a simple checkbox when creating the Azure IR. This is the recommended approach for any production environment where data security and network isolation are required.

ADF REST API and SDK — Programmatic Control

ADF exposes a full REST API and Python SDK. You can trigger pipeline runs, monitor status, create pipelines, and update parameters programmatically. This is useful when integrating ADF with external orchestration systems or building custom monitoring dashboards.

# Trigger an ADF pipeline run using the Python SDK
from azure.identity import DefaultAzureCredential
from azure.mgmt.datafactory import DataFactoryManagementClient

credential = DefaultAzureCredential()
client = DataFactoryManagementClient(credential, subscription_id)

run_response = client.pipelines.create_run(
    resource_group_name="my-rg",
    factory_name="my-adf",
    pipeline_name="Daily_Sales_Load",
    parameters={"load_date": "2024-01-16"}
)
print(f"Pipeline run ID: {run_response.run_id}")

Key Points

Use a metadata-driven framework to manage large numbers of source tables through a single generic pipeline controlled by a configuration table
Master ADF dynamic expressions — they turn hardcoded pipelines into flexible, reusable frameworks
Enable Schema Drift on Data Flows that read from sources with evolving schemas
Extract repeated transformation logic into Flowlets to avoid duplication across Data Flows
Enable the Managed VNet on the Azure IR for private, secure connectivity in production environments
Use the ADF Python SDK to integrate pipeline triggering into external orchestration systems

Previous lesson

Back to course

Next lesson