MongoDB Aggregation Pipeline

The aggregation pipeline is MongoDB's most powerful tool for data analysis. It processes documents through a series of stages, where each stage transforms the data in some way. The output of one stage becomes the input for the next, similar to an assembly line in a factory — raw material enters at one end, passes through several operations, and a finished product comes out the other end.

Aggregation goes far beyond simple filtering. It can group documents, calculate totals, reshape data structures, count records, compute averages, and much more — all within a single query.

How the Pipeline Works

The aggregation pipeline is an array of stages. Each stage is an object with a stage operator as the key.

db.collection.aggregate([
  { <stage1> },
  { <stage2> },
  { <stage3> }
])

All examples use a sales collection:

db.sales.insertMany([
  { product: "Laptop", category: "Electronics", quantity: 2, price: 55000, city: "Delhi", month: "January" },
  { product: "Phone", category: "Electronics", quantity: 5, price: 18000, city: "Mumbai", month: "January" },
  { product: "Desk Chair", category: "Furniture", quantity: 3, price: 8500, city: "Delhi", month: "February" },
  { product: "Notebook", category: "Stationery", quantity: 20, price: 120, city: "Pune", month: "January" },
  { product: "Monitor", category: "Electronics", quantity: 1, price: 18000, city: "Mumbai", month: "February" },
  { product: "Pen Set", category: "Stationery", quantity: 50, price: 80, city: "Delhi", month: "February" }
])

Stage 1: $match — Filter Documents

The $match stage filters documents based on a condition, just like the find() filter. Placing $match early in the pipeline reduces the number of documents processed in later stages, improving performance.

db.sales.aggregate([
  { $match: { category: "Electronics" } }
])

This passes only Electronics documents to the next stage.

Stage 2: $group — Group and Summarize

The $group stage groups documents by a field and applies an accumulator function to compute values like sum, count, or average for each group.

db.sales.aggregate([
  {
    $group: {
      _id: "$category",
      totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } },
      totalItems: { $sum: "$quantity" }
    }
  }
])

Result:

{ "_id": "Electronics", "totalRevenue": 218000, "totalItems": 8 }
{ "_id": "Furniture", "totalRevenue": 25500, "totalItems": 3 }
{ "_id": "Stationery", "totalRevenue": 6400, "totalItems": 70 }

The _id field in $group defines the grouping key. $sum adds up values, and $multiply multiplies two fields together to compute revenue per sale.

Common Accumulator Operators for $group

AccumulatorPurpose
$sumAdds up all values in the group
$avgCalculates the average
$minReturns the smallest value in the group
$maxReturns the largest value in the group
$countCounts the number of documents in the group
$pushCollects all values into an array
$firstReturns the first value in the group
$lastReturns the last value in the group

Stage 3: $project — Reshape the Output

The $project stage controls which fields appear in the output and allows creating new computed fields.

db.sales.aggregate([
  {
    $project: {
      _id: 0,
      product: 1,
      city: 1,
      totalValue: { $multiply: ["$quantity", "$price"] }
    }
  }
])

Result:

{ "product": "Laptop", "city": "Delhi", "totalValue": 110000 }
{ "product": "Phone", "city": "Mumbai", "totalValue": 90000 }
...

$project calculates totalValue (quantity × price) as a new field in the output without storing it in the database.

Stage 4: $sort — Sort the Results

The $sort stage sorts documents in ascending (1) or descending (-1) order.

db.sales.aggregate([
  {
    $group: {
      _id: "$category",
      totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
    }
  },
  { $sort: { totalRevenue: -1 } }
])

This groups by category, calculates total revenue, and then sorts categories from highest to lowest revenue.

Stage 5: $limit and $skip — Paginate Results

db.sales.aggregate([
  { $sort: { price: -1 } },
  { $skip: 2 },
  { $limit: 3 }
])

This sorts all documents by price (highest first), skips the top 2, and returns the next 3 — useful for pagination.

Stage 6: $unwind — Expand Arrays

The $unwind stage breaks apart an array field. Each element of the array becomes its own separate document in the pipeline output.

db.orders.insertOne({
  orderId: 1001,
  items: ["Laptop", "Mouse", "Keyboard"]
})

db.orders.aggregate([
  { $unwind: "$items" }
])

Result:

{ "orderId": 1001, "items": "Laptop" }
{ "orderId": 1001, "items": "Mouse" }
{ "orderId": 1001, "items": "Keyboard" }

One document with an array of 3 items becomes 3 separate documents — one per item. This is useful for counting, grouping, or filtering individual array elements.

Stage 7: $lookup — Join Two Collections

The $lookup stage performs a left outer join between two collections. It brings in matching documents from another collection and embeds them into the current documents.

db.orders.insertMany([
  { orderId: 1, customerId: 101, product: "Laptop" },
  { orderId: 2, customerId: 102, product: "Phone" }
])

db.customers.insertMany([
  { customerId: 101, name: "Deepak Rao", city: "Bengaluru" },
  { customerId: 102, name: "Mala Singh", city: "Hyderabad" }
])

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "customerId",
      as: "customerDetails"
    }
  }
])

Result:

{
  "orderId": 1,
  "customerId": 101,
  "product": "Laptop",
  "customerDetails": [
    { "customerId": 101, "name": "Deepak Rao", "city": "Bengaluru" }
  ]
}

The customer information is embedded directly into the order document through the join.

Stage 8: $addFields — Add New Fields

The $addFields stage adds new computed fields to documents without replacing existing ones.

db.sales.aggregate([
  {
    $addFields: {
      totalValue: { $multiply: ["$quantity", "$price"] },
      discountedPrice: { $multiply: ["$price", 0.9] }
    }
  }
])

Each document now has two additional fields: totalValue and discountedPrice, calculated on the fly.

A Complete Multi-Stage Pipeline Example

Find the top-earning city among Electronics sales in January:

db.sales.aggregate([
  { $match: { category: "Electronics", month: "January" } },
  {
    $group: {
      _id: "$city",
      totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
    }
  },
  { $sort: { totalRevenue: -1 } },
  { $limit: 1 }
])

Stages breakdown:

  1. $match — Keep only January Electronics records
  2. $group — Group by city, compute total revenue for each
  3. $sort — Sort by revenue from highest to lowest
  4. $limit — Return only the top city

Summary

The MongoDB aggregation pipeline processes documents through ordered stages. Each stage transforms the data and passes the result to the next. Key stages include $match for filtering, $group for summarizing, $project for reshaping, $sort for ordering, $limit and $skip for pagination, $unwind for expanding arrays, $lookup for joining collections, and $addFields for adding computed fields. Combining these stages enables powerful data analysis that would otherwise require complex code outside the database.

Leave a Comment