Azure Functions Performance and Scaling

One of the strongest features of Azure Functions is its ability to scale automatically. When traffic increases, Azure adds more instances of the function to handle the load. When traffic drops, instances are removed to save cost. Understanding how scaling works helps design functions that perform well under any load.

How Azure Functions Scales

┌──────────────────────────────────────────────────────────────┐
│              AUTO-SCALING DIAGRAM                            │
│                                                              │
│  Low Traffic (2 requests/min)                                │
│  ┌──────────┐                                                │
│  │Instance 1│  ← 1 instance handles all requests            │
│  └──────────┘                                                │
│                                                              │
│  Medium Traffic (200 requests/min)                           │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐                     │
│  │Instance 1│ │Instance 2│ │Instance 3│ ← Azure adds more   │
│  └──────────┘ └──────────┘ └──────────┘                     │
│                                                              │
│  High Traffic (5000 requests/min)                            │
│  ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ... (up to 200 instances)   │
│  └──┘ └──┘ └──┘ └──┘ └──┘ └──┘                             │
│                                                              │
│  Azure scales OUT (more instances), not UP (bigger server)   │
└──────────────────────────────────────────────────────────────┘

Scaling Behavior by Hosting Plan

Plan	Scale Trigger	Min Instances	Max Instances	Scale Speed
Consumption	Number of events	0	200	Fast (seconds)
Premium	Number of events + pre-warmed	1 (pre-warmed)	100	Instant (no cold start)
Dedicated	CPU/Memory metrics	Manual setting	Manual setting	Slower (minutes)

Cold Start Problem

On the Consumption plan, when no requests arrive for a while, Azure removes all instances to save resources. The next request finds zero instances running. Azure must create a new instance, load the function runtime, and then process the request. This delay is called a cold start.

┌──────────────────────────────────────────────────────────────┐
│                    COLD START                                │
│                                                              │
│  No traffic for 5+ minutes                                   │
│       │                                                      │
│       ▼                                                      │
│  Azure removes instances (scale to zero)                     │
│       │                                                      │
│       ▼                                                      │
│  New request arrives                                         │
│       │                                                      │
│       ▼                                                      │
│  Azure creates a new instance (~1–3 seconds delay)           │
│  Loads runtime + dependencies                                │
│  Processes request                                           │
│                                                              │
│  Cold Start Delay: 1–3 seconds (JavaScript)                  │
│                    5–15 seconds (C# .NET)                    │
└──────────────────────────────────────────────────────────────┘

Cold Start Reduction Strategies

Strategy	How It Helps	Cost
Premium Plan	Keeps 1+ pre-warmed instances always ready	Higher cost, no cold start
Keep dependencies small	Faster runtime load time	Free — code optimization
Use JavaScript instead of C#	Faster startup time	Free — language choice
Timer-based keep-alive ping	Pings function every few minutes to prevent scale-to-zero	Minimal — extra function executions

Performance Best Practices

1. Initialize Expensive Objects Outside the Handler

// BAD: Database client created on every invocation
module.exports = async function (context, req) {
    const dbClient = new DatabaseClient(process.env.DB_CONNECTION); // ← Created every time
    await dbClient.connect();
    const data = await dbClient.query("SELECT * FROM orders");
    context.res = { body: data };
};

// GOOD: Database client created once and reused across warm invocations
const dbClient = new DatabaseClient(process.env.DB_CONNECTION); // ← Created once
let isConnected = false;

module.exports = async function (context, req) {
    if (!isConnected) {
        await dbClient.connect();
        isConnected = true;
    }
    const data = await dbClient.query("SELECT * FROM orders");
    context.res = { body: data };
};

On a warm instance (already running), the code outside the handler function runs only once when the instance starts. Reusing connections saves hundreds of milliseconds per request.

2. Avoid Blocking the Event Loop

// BAD: Synchronous file read blocks the entire function
const fs = require("fs");

module.exports = async function (context, req) {
    const data = fs.readFileSync("./data.json"); // ← BLOCKS everything
    context.res = { body: JSON.parse(data) };
};

// GOOD: Async file read does not block
const fs = require("fs").promises;

module.exports = async function (context, req) {
    const data = await fs.readFile("./data.json"); // ← Non-blocking
    context.res = { body: JSON.parse(data) };
};

3. Run Independent Operations in Parallel

// BAD: Sequential — total time = time1 + time2 + time3
module.exports = async function (context, req) {
    const user    = await getUser(req.query.userId);    // 200ms
    const orders  = await getOrders(req.query.userId);  // 300ms
    const profile = await getProfile(req.query.userId); // 150ms
    // Total: ~650ms
};

// GOOD: Parallel — total time = max(time1, time2, time3)
module.exports = async function (context, req) {
    const [user, orders, profile] = await Promise.all([
        getUser(req.query.userId),    // ┐
        getOrders(req.query.userId),  // ├── All run at the same time
        getProfile(req.query.userId)  // ┘
    ]);
    // Total: ~300ms (only the slowest one)
};

4. Set Appropriate Timeouts

On the Consumption plan, the default function timeout is 5 minutes and the maximum is 10 minutes. On the Premium plan, the maximum is 60 minutes (or unlimited for Durable Functions).

// host.json: Set function timeout
{
  "version": "2.0",
  "functionTimeout": "00:05:00"
}

5. Keep the Deployment Package Small

# Only install production dependencies, not dev tools
npm install --production

# List what's taking the most space
du -sh node_modules/* | sort -hr | head -20

# Add to .funcignore to exclude from deployment
.git
.vscode
*.test.js
__tests__
README.md

Concurrency and Batching for Queue Triggers

Queue triggers process multiple messages at the same time. The batchSize setting in host.json controls how many messages one instance processes concurrently.

{
  "version": "2.0",
  "extensions": {
    "queues": {
      "batchSize": 16,           // Process 16 messages at a time
      "newBatchThreshold": 8,    // Fetch new batch when 8 remain
      "maxDequeueCount": 5,      // Retry 5 times before poison queue
      "visibilityTimeout": "00:00:30"  // Hide message for 30s during processing
    }
  }
}

Scaling Limits and Throttling

Resource	Consumption Limit	Premium Limit
Max instances	200	100
Max function timeout	10 minutes	60 minutes (or unlimited)
Max request size	100 MB	100 MB
Memory per instance	1.5 GB	3.5 GB or 14 GB

Performance Monitoring Checklist

Check function duration in Application Insights → Performance
Watch for cold starts in Application Insights → Failures (they appear as slow first requests)
Monitor memory usage to avoid hitting instance limits
Review dependency call durations (database queries, external API calls)
Use sampling in Application Insights to reduce telemetry volume in high-traffic apps

Previous lessons

Back to courses

Next lessons