ADE Connecting Data to Power BI

The ultimate purpose of a data engineering platform is to deliver clean, reliable data to business users who need it. Power BI is the most widely used business intelligence tool in the Microsoft ecosystem. Understanding how to connect your Azure data platform to Power BI — and how to serve data efficiently for reporting — is a key part of the data engineer's job.

The Data Engineer's Role in the Reporting Stack

A common misconception is that building Power BI dashboards is entirely the responsibility of a BI developer or analyst. The data engineer is responsible for the layer below the dashboard — the data model, the data quality, and the performance of the underlying tables that Power BI reads from.

When an executive says "the dashboard is slow," it is almost always a data layer problem, not a Power BI problem. The data engineer owns that fix.

Direct Query vs Import Mode in Power BI

Power BI has two fundamental modes for connecting to data sources. Choosing the right mode impacts performance, data freshness, and infrastructure cost.

Import Mode

Power BI copies all the data from the source into its own in-memory storage. Reports run against this local copy — which makes them extremely fast. The trade-off is that the data is only as fresh as the last refresh. You schedule refreshes (hourly, daily) to keep the data current.

Import mode is the right choice for most analytics scenarios where data freshness of a few hours is acceptable. It delivers the best report performance because everything runs in memory.

Direct Query Mode

Power BI sends a live query to the source system every time a user interacts with a report. The data is always current, but every click generates a query to your Azure SQL Database, Synapse, or ADLS Gen2. This can create heavy load on the source system and slow report performance.

Use Direct Query only when data must be real-time current — like a live operations dashboard — and when the source system is built to handle the query load. Azure Synapse Dedicated SQL Pool and Azure SQL Database handle Direct Query well when properly indexed.

Power BI Connecting to Azure Services

Connecting to Azure SQL Database

Power BI has a native Azure SQL Database connector. Enter the server name, database name, and credentials (or use organizational account with Entra ID SSO). Choose Import or Direct Query mode.

Best practice: create dedicated views or stored procedures for Power BI. Do not let Power BI query raw tables directly. Views give you control over exactly what data Power BI sees and allow you to optimize specifically for reporting queries.

Connecting to Azure Synapse Analytics

Power BI has a native Synapse connector. The Azure Synapse Analytics workspace link in Power BI service lets you publish Power BI datasets directly from Synapse Studio. This tight integration means analysts can explore data in Synapse and publish reports without leaving the Synapse environment.

Connecting to ADLS Gen2 via Serverless SQL

Power BI can query Parquet and Delta files in ADLS Gen2 by connecting to Synapse Serverless SQL Pool endpoints. Create external tables or views in Serverless SQL Pool and point Power BI to them. This approach serves data directly from the data lake without a warehouse — useful for scenarios where loading a dedicated warehouse is not justified.

Connecting to Azure Databricks

Power BI connects to Databricks via the Databricks SQL Warehouse (formerly SQL Analytics) endpoint or through a cluster endpoint. Databricks SQL Warehouses are optimized for BI query patterns and support Direct Query efficiently. Create Delta tables in Unity Catalog and expose them through Databricks SQL to Power BI.

Designing the Serving Layer for Power BI

The gold layer of your data lake is not always in the ideal shape for Power BI to consume directly. A semantic model — sometimes called a Power BI Dataset or a data mart — translates the gold layer into a structure optimized for self-service analytics.

The Star Schema for Power BI

Power BI's DAX (Data Analysis Expressions) calculation engine works best with a Star Schema. Fact tables hold numeric measures. Dimension tables hold descriptive attributes. Relationships connect them.

If your gold layer uses a denormalized wide table (all columns in one table), consider splitting it into a proper Star Schema for the Power BI semantic model. This makes DAX calculations simpler, improves filter performance, and allows Power BI to use its Vertipaq engine optimally.

Pre-Aggregating for Performance

If a Power BI report always shows data at the monthly level, there is no benefit in loading daily-level rows into the semantic model. Build gold layer tables that are pre-aggregated to the grain the reports need. Fewer rows means faster Import refreshes and faster DAX calculations.

-- Gold layer — pre-aggregated monthly sales for Power BI
CREATE TABLE gold.monthly_sales AS
SELECT
    YEAR(order_date)    AS year,
    MONTH(order_date)   AS month,
    region,
    product_category,
    SUM(amount)         AS total_revenue,
    COUNT(order_id)     AS order_count,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM silver.sales
GROUP BY YEAR(order_date), MONTH(order_date), region, product_category

Incremental Refresh in Power BI

For large datasets, a full refresh every day is slow and expensive. Power BI's Incremental Refresh feature refreshes only the new and recently changed data rather than the full history. You configure a date range window — for example, refresh only the last 3 days of data while keeping historical data unchanged.

The data engineer must ensure the source table has a reliable updated timestamp column and that the data is partitioned appropriately for incremental refresh to work efficiently.

Power BI Dataflows — Reusable Data Preparation

Power BI Dataflows are cloud-based ETL transformations built inside the Power BI service. Multiple Power BI reports can share a single Dataflow, ensuring consistent business logic across the organization.

For data engineers, Dataflows are relevant because they can connect directly to ADLS Gen2 using Dataflow Gen2 (part of Microsoft Fabric). This positions them as a bridge between the data lake and self-service analytics — letting analysts prepare data without needing SQL or Python skills.

Key Points

Use Import mode for most Power BI reports — it delivers the fastest report performance
Use Direct Query only for real-time dashboards where data freshness of minutes is genuinely required
Expose data to Power BI through views or pre-defined SQL endpoints — never let Power BI query raw tables directly
Design the gold layer in Star Schema format — it aligns perfectly with Power BI's DAX engine and delivers the best analytical performance
Pre-aggregate gold tables to the grain the reports actually need — eliminate unnecessary row-level detail that reports never use
Configure Incremental Refresh on large Power BI datasets to avoid full daily refreshes of historical data

Previous lesson

Back to course

Next lesson