ADE Azure Data Security and Governance
Data security and governance are not optional extras — they are core responsibilities of every Azure data engineer. A poorly secured data platform can expose customer information, violate regulations like GDPR, and cause serious legal and financial consequences. This topic covers how to build data platforms that are secure by design.
The Four Pillars of Data Security in Azure
Azure data security works across four layers. Securing only one layer leaves gaps that attackers or accidental misconfigurations can exploit.
- Identity: Who is allowed to access the system
- Network: Which network paths can reach the system
- Data: How the data itself is protected at rest and in transit
- Audit: What activity is recorded and monitored
Identity Security — Controlling Who Gets In
Azure Active Directory (Microsoft Entra ID)
All Azure identity management runs through Microsoft Entra ID (formerly Azure Active Directory). Every user, service, and application that accesses Azure resources authenticates through Entra ID. Think of it as the security desk at the entrance of a building — every person must show their ID badge before entering.
Managed Identity — Passwordless Service Authentication
When an Azure Data Factory pipeline needs to read from ADLS Gen2, it must authenticate. The wrong way is to put a storage account key or connection string inside the pipeline — keys can be stolen or accidentally exposed in version control.
The right way is a Managed Identity. Azure automatically creates an identity for the ADF instance and manages its credentials internally. You grant that identity the necessary role on the storage account. No passwords. No keys. Nothing to rotate or accidentally expose.
Always use Managed Identity for service-to-service authentication in Azure. It is the most secure and lowest-maintenance option available.
Multi-Factor Authentication (MFA)
Human users accessing the Azure portal or Databricks workspace should be required to use MFA — a second proof of identity beyond a password. Enforcing MFA through Entra ID Conditional Access policies protects against stolen passwords.
Network Security — Controlling Which Paths Lead In
Virtual Networks and Private Endpoints
By default, Azure services are accessible over the public internet. Anyone who knows your storage account URL and has credentials can reach it from anywhere in the world. This is convenient but increases the attack surface.
A Virtual Network (VNet) is a private, isolated network in Azure. You place your services inside the VNet so they communicate privately without touching the public internet.
A Private Endpoint gives an Azure service a private IP address inside your VNet. Traffic between your ADF, Databricks, and ADLS Gen2 travels only through your private network. The public internet endpoint is disabled completely.
This is equivalent to running all internal communication through a company's internal phone system rather than calling each person's personal mobile number — the conversation stays inside the organization.
Service Endpoints
Service Endpoints are a simpler, lower-cost alternative to Private Endpoints. They route traffic from your VNet to Azure services over the Azure backbone network rather than the public internet. They do not give the service a private IP, but they do restrict access to traffic coming only from your VNet.
Azure Firewall and Network Security Groups
Network Security Groups (NSGs) filter inbound and outbound traffic to Azure resources based on rules — allow or deny specific IP ranges and ports. Azure Firewall provides more advanced filtering including application-level rules and threat intelligence-based blocking.
Data Encryption
Encryption at Rest
All data stored in Azure — ADLS Gen2, Azure SQL Database, Synapse, Cosmos DB — is encrypted at rest by default using 256-bit AES encryption. Microsoft manages the encryption keys automatically. For organizations with strict compliance requirements, you can bring your own encryption keys managed in Azure Key Vault.
Encryption in Transit
All data moving between Azure services or between your applications and Azure is encrypted using TLS (Transport Layer Security). This is enforced by default. Always use HTTPS endpoints and the abfss:// (secure) protocol rather than abfs:// when connecting to ADLS Gen2.
Azure Key Vault — Managing Secrets Safely
Connection strings, API keys, storage account keys, and passwords should never be stored inside pipeline code, notebooks, or configuration files. Azure Key Vault is a secure vault for all these sensitive values.
Your ADF pipeline or Databricks notebook fetches the secret from Key Vault at runtime. The secret value never appears in your code. If someone reads your notebook code, they see only the Key Vault reference — not the actual secret.
# In Databricks — retrieve a secret from Azure Key Vault storage_account_key = dbutils.secrets.get(scope="my-keyvault-scope", key="storage-account-key")
Key Vault also manages certificates and encryption keys. Access to Key Vault itself is controlled through RBAC — only authorized services and users can retrieve secrets.
Microsoft Purview — Data Governance at Scale
As data platforms grow, tracking what data exists, where it lives, who owns it, and whether it contains sensitive information becomes a major challenge. Microsoft Purview is Azure's enterprise data governance tool.
Data Catalog
Purview scans your Azure services — ADLS Gen2, Synapse, SQL Database, ADF — and automatically discovers all data assets. It builds a searchable catalog. An analyst searching for "customer email" finds every table and file in the organization that contains customer email data, along with where it lives and who owns it.
Data Lineage
Purview tracks how data flows through your platform — from the source system, through ADF pipelines, through Databricks transformations, into Synapse tables. This lineage map answers the question: "This dashboard number is wrong — where did the data come from and which pipeline touched it?" Tracing a data quality problem from the final report back to the source takes minutes instead of days.
Sensitivity Labels and Classification
Purview automatically scans data for sensitive information — names, phone numbers, email addresses, credit card numbers, national ID numbers. It applies sensitivity labels so your organization knows which datasets require extra protection and which users can access them.
Row-Level Security and Column-Level Security
Sometimes different users need to see different subsets of the same table. A regional sales manager should see only their region's data, not the entire company's sales.
Row-Level Security (RLS) in Azure SQL
-- Create a security policy that filters rows by region CREATE FUNCTION dbo.fn_securitypredicate(@region AS VARCHAR(50)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT 1 AS fn_result WHERE @region = USER_NAME() OR USER_NAME() = 'admin'; CREATE SECURITY POLICY SalesFilter ADD FILTER PREDICATE dbo.fn_securitypredicate(region) ON dbo.fact_sales WITH (STATE = ON);
Dynamic Data Masking
Dynamic Data Masking hides sensitive column values from users who do not have permission to see the full data. A customer service representative sees a credit card number as XXXX-XXXX-XXXX-1234. A billing manager sees the full number. The underlying data is unchanged — only the display is masked based on the user's role.
Compliance and Regulatory Considerations
Data engineers often work with data subject to regulations:
- GDPR (Europe): Personal data of EU residents must be stored, processed, and deleted according to strict rules. Data subjects can request deletion of their data.
- HIPAA (US Healthcare): Health information must be protected with specific technical safeguards.
- PCI DSS (Payment Cards): Payment card data must be encrypted, access-controlled, and audited.
Azure is certified compliant with all major global regulations. The compliance is at the platform level — but the data engineer is responsible for configuring the platform correctly to maintain compliance within their specific solution.
Key Points
- Secure data at all four layers — identity, network, data encryption, and audit logging
- Always use Managed Identity for service-to-service authentication — never hardcode keys or passwords
- Store all secrets in Azure Key Vault — reference them from pipelines and notebooks, never embed them in code
- Use Private Endpoints to eliminate public internet exposure of data services in production environments
- Deploy Microsoft Purview to maintain a data catalog, track lineage, and classify sensitive data
- Apply Row-Level Security and Dynamic Data Masking to enforce data access rules within shared tables
