MongoDB Aggregation Pipeline

The aggregation pipeline is MongoDB's most powerful tool for data analysis. It processes documents through a series of stages, where each stage transforms the data in some way. The output of one stage becomes the input for the next, similar to an assembly line in a factory — raw material enters at one end, passes through several operations, and a finished product comes out the other end.

Aggregation goes far beyond simple filtering. It can group documents, calculate totals, reshape data structures, count records, compute averages, and much more — all within a single query.

How the Pipeline Works

The aggregation pipeline is an array of stages. Each stage is an object with a stage operator as the key.

db.collection.aggregate([
  { <stage1> },
  { <stage2> },
  { <stage3> }
])

All examples use a sales collection:

db.sales.insertMany([
  { product: "Laptop", category: "Electronics", quantity: 2, price: 55000, city: "Delhi", month: "January" },
  { product: "Phone", category: "Electronics", quantity: 5, price: 18000, city: "Mumbai", month: "January" },
  { product: "Desk Chair", category: "Furniture", quantity: 3, price: 8500, city: "Delhi", month: "February" },
  { product: "Notebook", category: "Stationery", quantity: 20, price: 120, city: "Pune", month: "January" },
  { product: "Monitor", category: "Electronics", quantity: 1, price: 18000, city: "Mumbai", month: "February" },
  { product: "Pen Set", category: "Stationery", quantity: 50, price: 80, city: "Delhi", month: "February" }
])

Stage 1: $match — Filter Documents

The $match stage filters documents based on a condition, just like the find() filter. Placing $match early in the pipeline reduces the number of documents processed in later stages, improving performance.

db.sales.aggregate([
  { $match: { category: "Electronics" } }
])

This passes only Electronics documents to the next stage.

Stage 2: $group — Group and Summarize

The $group stage groups documents by a field and applies an accumulator function to compute values like sum, count, or average for each group.

db.sales.aggregate([
  {
    $group: {
      _id: "$category",
      totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } },
      totalItems: { $sum: "$quantity" }
    }
  }
])

Result:

{ "_id": "Electronics", "totalRevenue": 218000, "totalItems": 8 }
{ "_id": "Furniture", "totalRevenue": 25500, "totalItems": 3 }
{ "_id": "Stationery", "totalRevenue": 6400, "totalItems": 70 }

The _id field in $group defines the grouping key. $sum adds up values, and $multiply multiplies two fields together to compute revenue per sale.

Common Accumulator Operators for $group

Accumulator	Purpose
`$sum`	Adds up all values in the group
`$avg`	Calculates the average
`$min`	Returns the smallest value in the group
`$max`	Returns the largest value in the group
`$count`	Counts the number of documents in the group
`$push`	Collects all values into an array
`$first`	Returns the first value in the group
`$last`	Returns the last value in the group

Stage 3: $project — Reshape the Output

The $project stage controls which fields appear in the output and allows creating new computed fields.

db.sales.aggregate([
  {
    $project: {
      _id: 0,
      product: 1,
      city: 1,
      totalValue: { $multiply: ["$quantity", "$price"] }
    }
  }
])

Result:

{ "product": "Laptop", "city": "Delhi", "totalValue": 110000 }
{ "product": "Phone", "city": "Mumbai", "totalValue": 90000 }
...

$project calculates totalValue (quantity × price) as a new field in the output without storing it in the database.

Stage 4: $sort — Sort the Results

The $sort stage sorts documents in ascending (1) or descending (-1) order.

db.sales.aggregate([
  {
    $group: {
      _id: "$category",
      totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
    }
  },
  { $sort: { totalRevenue: -1 } }
])

This groups by category, calculates total revenue, and then sorts categories from highest to lowest revenue.

Stage 5: $limit and $skip — Paginate Results

db.sales.aggregate([
  { $sort: { price: -1 } },
  { $skip: 2 },
  { $limit: 3 }
])

This sorts all documents by price (highest first), skips the top 2, and returns the next 3 — useful for pagination.

Stage 6: $unwind — Expand Arrays

The $unwind stage breaks apart an array field. Each element of the array becomes its own separate document in the pipeline output.

db.orders.insertOne({
  orderId: 1001,
  items: ["Laptop", "Mouse", "Keyboard"]
})

db.orders.aggregate([
  { $unwind: "$items" }
])

Result:

{ "orderId": 1001, "items": "Laptop" }
{ "orderId": 1001, "items": "Mouse" }
{ "orderId": 1001, "items": "Keyboard" }

One document with an array of 3 items becomes 3 separate documents — one per item. This is useful for counting, grouping, or filtering individual array elements.

Stage 7: $lookup — Join Two Collections

The $lookup stage performs a left outer join between two collections. It brings in matching documents from another collection and embeds them into the current documents.

db.orders.insertMany([
  { orderId: 1, customerId: 101, product: "Laptop" },
  { orderId: 2, customerId: 102, product: "Phone" }
])

db.customers.insertMany([
  { customerId: 101, name: "Deepak Rao", city: "Bengaluru" },
  { customerId: 102, name: "Mala Singh", city: "Hyderabad" }
])

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "customerId",
      as: "customerDetails"
    }
  }
])

Result:

{
  "orderId": 1,
  "customerId": 101,
  "product": "Laptop",
  "customerDetails": [
    { "customerId": 101, "name": "Deepak Rao", "city": "Bengaluru" }
  ]
}

The customer information is embedded directly into the order document through the join.

Stage 8: $addFields — Add New Fields

The $addFields stage adds new computed fields to documents without replacing existing ones.

db.sales.aggregate([
  {
    $addFields: {
      totalValue: { $multiply: ["$quantity", "$price"] },
      discountedPrice: { $multiply: ["$price", 0.9] }
    }
  }
])

Each document now has two additional fields: totalValue and discountedPrice, calculated on the fly.

A Complete Multi-Stage Pipeline Example

Find the top-earning city among Electronics sales in January:

db.sales.aggregate([
  { $match: { category: "Electronics", month: "January" } },
  {
    $group: {
      _id: "$city",
      totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
    }
  },
  { $sort: { totalRevenue: -1 } },
  { $limit: 1 }
])

Stages breakdown:

$match — Keep only January Electronics records
$group — Group by city, compute total revenue for each
$sort — Sort by revenue from highest to lowest
$limit — Return only the top city

Summary

The MongoDB aggregation pipeline processes documents through ordered stages. Each stage transforms the data and passes the result to the next. Key stages include $match for filtering, $group for summarizing, $project for reshaping, $sort for ordering, $limit and $skip for pagination, $unwind for expanding arrays, $lookup for joining collections, and $addFields for adding computed fields. Combining these stages enables powerful data analysis that would otherwise require complex code outside the database.

Previous lessons

Back to courses

Next lessons

MongoDB Aggregation Pipeline

How the Pipeline Works

Stage 1: $match — Filter Documents

Stage 2: $group — Group and Summarize

Common Accumulator Operators for $group

Stage 3: $project — Reshape the Output

Stage 4: $sort — Sort the Results

Stage 5: $limit and $skip — Paginate Results

Stage 6: $unwind — Expand Arrays

Stage 7: $lookup — Join Two Collections

Stage 8: $addFields — Add New Fields

A Complete Multi-Stage Pipeline Example

Summary

Leave a Comment Cancel reply