Azure Functions Performance and Scaling
One of the strongest features of Azure Functions is its ability to scale automatically. When traffic increases, Azure adds more instances of the function to handle the load. When traffic drops, instances are removed to save cost. Understanding how scaling works helps design functions that perform well under any load.
How Azure Functions Scales
┌──────────────────────────────────────────────────────────────┐ │ AUTO-SCALING DIAGRAM │ │ │ │ Low Traffic (2 requests/min) │ │ ┌──────────┐ │ │ │Instance 1│ ← 1 instance handles all requests │ │ └──────────┘ │ │ │ │ Medium Traffic (200 requests/min) │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │Instance 1│ │Instance 2│ │Instance 3│ ← Azure adds more │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ High Traffic (5000 requests/min) │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ... (up to 200 instances) │ │ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ │ │ │ │ Azure scales OUT (more instances), not UP (bigger server) │ └──────────────────────────────────────────────────────────────┘
Scaling Behavior by Hosting Plan
| Plan | Scale Trigger | Min Instances | Max Instances | Scale Speed |
|---|---|---|---|---|
| Consumption | Number of events | 0 | 200 | Fast (seconds) |
| Premium | Number of events + pre-warmed | 1 (pre-warmed) | 100 | Instant (no cold start) |
| Dedicated | CPU/Memory metrics | Manual setting | Manual setting | Slower (minutes) |
Cold Start Problem
On the Consumption plan, when no requests arrive for a while, Azure removes all instances to save resources. The next request finds zero instances running. Azure must create a new instance, load the function runtime, and then process the request. This delay is called a cold start.
┌──────────────────────────────────────────────────────────────┐ │ COLD START │ │ │ │ No traffic for 5+ minutes │ │ │ │ │ ▼ │ │ Azure removes instances (scale to zero) │ │ │ │ │ ▼ │ │ New request arrives │ │ │ │ │ ▼ │ │ Azure creates a new instance (~1–3 seconds delay) │ │ Loads runtime + dependencies │ │ Processes request │ │ │ │ Cold Start Delay: 1–3 seconds (JavaScript) │ │ 5–15 seconds (C# .NET) │ └──────────────────────────────────────────────────────────────┘
Cold Start Reduction Strategies
| Strategy | How It Helps | Cost |
|---|---|---|
| Premium Plan | Keeps 1+ pre-warmed instances always ready | Higher cost, no cold start |
| Keep dependencies small | Faster runtime load time | Free — code optimization |
| Use JavaScript instead of C# | Faster startup time | Free — language choice |
| Timer-based keep-alive ping | Pings function every few minutes to prevent scale-to-zero | Minimal — extra function executions |
Performance Best Practices
1. Initialize Expensive Objects Outside the Handler
// BAD: Database client created on every invocation
module.exports = async function (context, req) {
const dbClient = new DatabaseClient(process.env.DB_CONNECTION); // ← Created every time
await dbClient.connect();
const data = await dbClient.query("SELECT * FROM orders");
context.res = { body: data };
};
// GOOD: Database client created once and reused across warm invocations
const dbClient = new DatabaseClient(process.env.DB_CONNECTION); // ← Created once
let isConnected = false;
module.exports = async function (context, req) {
if (!isConnected) {
await dbClient.connect();
isConnected = true;
}
const data = await dbClient.query("SELECT * FROM orders");
context.res = { body: data };
};
On a warm instance (already running), the code outside the handler function runs only once when the instance starts. Reusing connections saves hundreds of milliseconds per request.
2. Avoid Blocking the Event Loop
// BAD: Synchronous file read blocks the entire function
const fs = require("fs");
module.exports = async function (context, req) {
const data = fs.readFileSync("./data.json"); // ← BLOCKS everything
context.res = { body: JSON.parse(data) };
};
// GOOD: Async file read does not block
const fs = require("fs").promises;
module.exports = async function (context, req) {
const data = await fs.readFile("./data.json"); // ← Non-blocking
context.res = { body: JSON.parse(data) };
};
3. Run Independent Operations in Parallel
// BAD: Sequential — total time = time1 + time2 + time3
module.exports = async function (context, req) {
const user = await getUser(req.query.userId); // 200ms
const orders = await getOrders(req.query.userId); // 300ms
const profile = await getProfile(req.query.userId); // 150ms
// Total: ~650ms
};
// GOOD: Parallel — total time = max(time1, time2, time3)
module.exports = async function (context, req) {
const [user, orders, profile] = await Promise.all([
getUser(req.query.userId), // ┐
getOrders(req.query.userId), // ├── All run at the same time
getProfile(req.query.userId) // ┘
]);
// Total: ~300ms (only the slowest one)
};
4. Set Appropriate Timeouts
On the Consumption plan, the default function timeout is 5 minutes and the maximum is 10 minutes. On the Premium plan, the maximum is 60 minutes (or unlimited for Durable Functions).
// host.json: Set function timeout
{
"version": "2.0",
"functionTimeout": "00:05:00"
}
5. Keep the Deployment Package Small
# Only install production dependencies, not dev tools npm install --production # List what's taking the most space du -sh node_modules/* | sort -hr | head -20 # Add to .funcignore to exclude from deployment .git .vscode *.test.js __tests__ README.md
Concurrency and Batching for Queue Triggers
Queue triggers process multiple messages at the same time. The batchSize setting in host.json controls how many messages one instance processes concurrently.
{
"version": "2.0",
"extensions": {
"queues": {
"batchSize": 16, // Process 16 messages at a time
"newBatchThreshold": 8, // Fetch new batch when 8 remain
"maxDequeueCount": 5, // Retry 5 times before poison queue
"visibilityTimeout": "00:00:30" // Hide message for 30s during processing
}
}
}
Scaling Limits and Throttling
| Resource | Consumption Limit | Premium Limit |
|---|---|---|
| Max instances | 200 | 100 |
| Max function timeout | 10 minutes | 60 minutes (or unlimited) |
| Max request size | 100 MB | 100 MB |
| Memory per instance | 1.5 GB | 3.5 GB or 14 GB |
Performance Monitoring Checklist
- Check function duration in Application Insights → Performance
- Watch for cold starts in Application Insights → Failures (they appear as slow first requests)
- Monitor memory usage to avoid hitting instance limits
- Review dependency call durations (database queries, external API calls)
- Use sampling in Application Insights to reduce telemetry volume in high-traffic apps
