Azure Functions Performance and Scaling

One of the strongest features of Azure Functions is its ability to scale automatically. When traffic increases, Azure adds more instances of the function to handle the load. When traffic drops, instances are removed to save cost. Understanding how scaling works helps design functions that perform well under any load.

How Azure Functions Scales

┌──────────────────────────────────────────────────────────────┐
│              AUTO-SCALING DIAGRAM                            │
│                                                              │
│  Low Traffic (2 requests/min)                                │
│  ┌──────────┐                                                │
│  │Instance 1│  ← 1 instance handles all requests            │
│  └──────────┘                                                │
│                                                              │
│  Medium Traffic (200 requests/min)                           │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐                     │
│  │Instance 1│ │Instance 2│ │Instance 3│ ← Azure adds more   │
│  └──────────┘ └──────────┘ └──────────┘                     │
│                                                              │
│  High Traffic (5000 requests/min)                            │
│  ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ... (up to 200 instances)   │
│  └──┘ └──┘ └──┘ └──┘ └──┘ └──┘                             │
│                                                              │
│  Azure scales OUT (more instances), not UP (bigger server)   │
└──────────────────────────────────────────────────────────────┘

Scaling Behavior by Hosting Plan

PlanScale TriggerMin InstancesMax InstancesScale Speed
ConsumptionNumber of events0200Fast (seconds)
PremiumNumber of events + pre-warmed1 (pre-warmed)100Instant (no cold start)
DedicatedCPU/Memory metricsManual settingManual settingSlower (minutes)

Cold Start Problem

On the Consumption plan, when no requests arrive for a while, Azure removes all instances to save resources. The next request finds zero instances running. Azure must create a new instance, load the function runtime, and then process the request. This delay is called a cold start.

┌──────────────────────────────────────────────────────────────┐
│                    COLD START                                │
│                                                              │
│  No traffic for 5+ minutes                                   │
│       │                                                      │
│       ▼                                                      │
│  Azure removes instances (scale to zero)                     │
│       │                                                      │
│       ▼                                                      │
│  New request arrives                                         │
│       │                                                      │
│       ▼                                                      │
│  Azure creates a new instance (~1–3 seconds delay)           │
│  Loads runtime + dependencies                                │
│  Processes request                                           │
│                                                              │
│  Cold Start Delay: 1–3 seconds (JavaScript)                  │
│                    5–15 seconds (C# .NET)                    │
└──────────────────────────────────────────────────────────────┘

Cold Start Reduction Strategies

StrategyHow It HelpsCost
Premium PlanKeeps 1+ pre-warmed instances always readyHigher cost, no cold start
Keep dependencies smallFaster runtime load timeFree — code optimization
Use JavaScript instead of C#Faster startup timeFree — language choice
Timer-based keep-alive pingPings function every few minutes to prevent scale-to-zeroMinimal — extra function executions

Performance Best Practices

1. Initialize Expensive Objects Outside the Handler

// BAD: Database client created on every invocation
module.exports = async function (context, req) {
    const dbClient = new DatabaseClient(process.env.DB_CONNECTION); // ← Created every time
    await dbClient.connect();
    const data = await dbClient.query("SELECT * FROM orders");
    context.res = { body: data };
};

// GOOD: Database client created once and reused across warm invocations
const dbClient = new DatabaseClient(process.env.DB_CONNECTION); // ← Created once
let isConnected = false;

module.exports = async function (context, req) {
    if (!isConnected) {
        await dbClient.connect();
        isConnected = true;
    }
    const data = await dbClient.query("SELECT * FROM orders");
    context.res = { body: data };
};

On a warm instance (already running), the code outside the handler function runs only once when the instance starts. Reusing connections saves hundreds of milliseconds per request.

2. Avoid Blocking the Event Loop

// BAD: Synchronous file read blocks the entire function
const fs = require("fs");

module.exports = async function (context, req) {
    const data = fs.readFileSync("./data.json"); // ← BLOCKS everything
    context.res = { body: JSON.parse(data) };
};

// GOOD: Async file read does not block
const fs = require("fs").promises;

module.exports = async function (context, req) {
    const data = await fs.readFile("./data.json"); // ← Non-blocking
    context.res = { body: JSON.parse(data) };
};

3. Run Independent Operations in Parallel

// BAD: Sequential — total time = time1 + time2 + time3
module.exports = async function (context, req) {
    const user    = await getUser(req.query.userId);    // 200ms
    const orders  = await getOrders(req.query.userId);  // 300ms
    const profile = await getProfile(req.query.userId); // 150ms
    // Total: ~650ms
};

// GOOD: Parallel — total time = max(time1, time2, time3)
module.exports = async function (context, req) {
    const [user, orders, profile] = await Promise.all([
        getUser(req.query.userId),    // ┐
        getOrders(req.query.userId),  // ├── All run at the same time
        getProfile(req.query.userId)  // ┘
    ]);
    // Total: ~300ms (only the slowest one)
};

4. Set Appropriate Timeouts

On the Consumption plan, the default function timeout is 5 minutes and the maximum is 10 minutes. On the Premium plan, the maximum is 60 minutes (or unlimited for Durable Functions).

// host.json: Set function timeout
{
  "version": "2.0",
  "functionTimeout": "00:05:00"
}

5. Keep the Deployment Package Small

# Only install production dependencies, not dev tools
npm install --production

# List what's taking the most space
du -sh node_modules/* | sort -hr | head -20

# Add to .funcignore to exclude from deployment
.git
.vscode
*.test.js
__tests__
README.md

Concurrency and Batching for Queue Triggers

Queue triggers process multiple messages at the same time. The batchSize setting in host.json controls how many messages one instance processes concurrently.

{
  "version": "2.0",
  "extensions": {
    "queues": {
      "batchSize": 16,           // Process 16 messages at a time
      "newBatchThreshold": 8,    // Fetch new batch when 8 remain
      "maxDequeueCount": 5,      // Retry 5 times before poison queue
      "visibilityTimeout": "00:00:30"  // Hide message for 30s during processing
    }
  }
}

Scaling Limits and Throttling

ResourceConsumption LimitPremium Limit
Max instances200100
Max function timeout10 minutes60 minutes (or unlimited)
Max request size100 MB100 MB
Memory per instance1.5 GB3.5 GB or 14 GB

Performance Monitoring Checklist

  • Check function duration in Application Insights → Performance
  • Watch for cold starts in Application Insights → Failures (they appear as slow first requests)
  • Monitor memory usage to avoid hitting instance limits
  • Review dependency call durations (database queries, external API calls)
  • Use sampling in Application Insights to reduce telemetry volume in high-traffic apps

Leave a Comment