MongoDB Schema Design

Schema design in MongoDB refers to how data is structured inside documents and collections. Unlike SQL databases, MongoDB does not enforce a strict schema by default — documents in the same collection can have different fields. This flexibility is powerful, but it also means thoughtful design decisions must be made upfront to avoid performance problems and data inconsistencies later.

Good schema design in MongoDB is driven by one key question: How will the application access this data? The way data is read and written should guide how it is structured.

The Core Design Decision: Embed or Reference?

The most important decision in MongoDB schema design is whether to embed related data inside a single document or store it in a separate collection and reference it.

Embedding

Embedding means placing related data directly inside a document as a sub-document or array. All related information lives in one place.

{
  "_id": ObjectId("..."),
  "name": "Varun Malhotra",
  "email": "varun@example.com",
  "address": {
    "street": "45 Park Lane",
    "city": "Chennai",
    "pincode": "600001"
  },
  "phone": ["9876543210", "9123456789"]
}

The address and phone numbers are embedded directly in the user document. Fetching this user also fetches the address and phone numbers in a single database call.

Referencing

Referencing means storing data in separate collections and linking them using IDs, similar to foreign keys in SQL databases.

// users collection
{
  "_id": ObjectId("u001"),
  "name": "Varun Malhotra",
  "email": "varun@example.com",
  "orderId": ObjectId("o001")
}

// orders collection
{
  "_id": ObjectId("o001"),
  "product": "Laptop",
  "amount": 55000,
  "status": "Delivered"
}

The user document holds only the order's ID. To get the full order details, a second query is needed on the orders collection, or a $lookup in an aggregation pipeline.

When to Embed

Embedding works best when:

The related data is always accessed together with the parent document
The related data belongs to one parent only (one-to-one or one-to-few relationships)
The embedded data is relatively small and unlikely to grow very large
Reads must be fast and queries should avoid multiple round trips

Good Embedding Example — Product with Reviews

{
  "productName": "Wireless Mouse",
  "price": 850,
  "reviews": [
    { "user": "Arun", "rating": 5, "comment": "Great product" },
    { "user": "Lata", "rating": 4, "comment": "Works well" }
  ]
}

Reviews only make sense in the context of a product, are always read together with the product, and a product typically has a manageable number of reviews.

When to Reference

Referencing works best when:

The related data is large or grows unboundedly
The related data is shared by multiple parent documents
The embedded data is frequently updated independently
There is a many-to-many relationship between data

Good Referencing Example — Students and Courses

A student can enroll in many courses, and a course can have many students enrolled. Embedding either inside the other causes data duplication.

// students collection
{
  "_id": ObjectId("s001"),
  "name": "Priya Nair",
  "enrolledCourseIds": [ObjectId("c001"), ObjectId("c002")]
}

// courses collection
{
  "_id": ObjectId("c001"),
  "courseName": "MongoDB Fundamentals",
  "instructor": "Dr. Ramesh"
}

Each student stores an array of course IDs. Course details live in the courses collection and are not duplicated for each enrolled student.

One-to-One Relationship

A one-to-one relationship means one document is directly related to exactly one other document. The best approach is to embed — no need for a separate collection.

{
  "employeeId": "E101",
  "name": "Deepa Krishnan",
  "passport": {
    "number": "Z1234567",
    "issuedAt": "Mumbai",
    "expiryDate": ISODate("2030-01-01")
  }
}

One-to-Many Relationship

A one-to-many relationship means one document is related to several others. The decision to embed or reference depends on the size and access pattern.

One-to-Few (Embed)

{
  "blogTitle": "MongoDB Tips",
  "author": "Karan Shah",
  "tags": ["nosql", "mongodb", "database"]
}

One-to-Many (Reference if Large)

// hospital collection
{ "_id": ObjectId("h001"), "hospitalName": "City Care Hospital" }

// patients collection
{ "_id": ObjectId("p001"), "patientName": "Sanjay Pillai", "hospitalId": ObjectId("h001") }
{ "_id": ObjectId("p002"), "patientName": "Renu Das", "hospitalId": ObjectId("h001") }

A hospital has thousands of patients. Embedding all patients inside the hospital document would create a document too large to handle efficiently.

Many-to-Many Relationship

Many-to-many relationships always benefit from referencing. Each side stores an array of IDs pointing to the other side.

// authors collection
{ "_id": ObjectId("a001"), "name": "Meera Ghosh", "bookIds": [ObjectId("b001"), ObjectId("b002")] }

// books collection
{ "_id": ObjectId("b001"), "title": "MongoDB Deep Dive", "authorIds": [ObjectId("a001")] }

The Bucket Pattern

The bucket pattern groups time-series or sequential data into fixed-size buckets rather than creating one document per data point. This is common for IoT sensor data, logs, or financial transactions.

Without Bucketing (One document per reading — inefficient at scale)

{ "sensorId": "S1", "timestamp": ISODate("2025-01-01T10:00:00Z"), "temperature": 22.5 }
{ "sensorId": "S1", "timestamp": ISODate("2025-01-01T10:01:00Z"), "temperature": 22.7 }
// Thousands of individual documents per sensor per day

With Bucketing (Group readings per hour)

{
  "sensorId": "S1",
  "hour": ISODate("2025-01-01T10:00:00Z"),
  "readings": [
    { "minute": 0, "temperature": 22.5 },
    { "minute": 1, "temperature": 22.7 },
    { "minute": 2, "temperature": 22.6 }
  ],
  "count": 3
}

Grouping reduces the total number of documents dramatically while keeping related data together.

The Outlier Pattern

When most documents in a collection have a small number of related items but a few have thousands, the outlier pattern handles both efficiently. A flag field marks which documents have overflow data stored in a separate collection.

// Most authors (normal case)
{ "name": "Raj Bose", "books": ["Book A", "Book B"], "hasOverflow": false }

// Prolific author (outlier case)
{ "name": "Famous Writer", "books": ["Book 1", "Book 2"], "hasOverflow": true }

// overflow_books collection
{ "authorName": "Famous Writer", "extraBooks": ["Book 3", "Book 4", ... ] }

MongoDB Document Size Limit

A single MongoDB document has a maximum size of 16 MB. For most use cases, this limit is never reached. However, deeply nested documents or documents with large embedded arrays can approach this limit. When embedding data that grows over time (like comments on a post or log entries), referencing is the safer choice.

Schema Validation

Although MongoDB is schema-flexible, validation rules can be applied to a collection to enforce structure when needed.

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email"],
      properties: {
        name: { bsonType: "string" },
        email: { bsonType: "string" },
        age: { bsonType: "int", minimum: 18 }
      }
    }
  }
})

This ensures every document in the users collection has a name and email field and that age, if provided, is an integer of 18 or above. Attempts to insert invalid documents result in an error.

Summary

MongoDB schema design revolves around the embed-vs-reference decision, which depends on data size, access patterns, and relationship types. Embedding suits data that is always read together and belongs to one parent. Referencing suits large, shared, or independently updated data. One-to-one and one-to-few relationships favor embedding. One-to-many and many-to-many relationships often favor referencing. Patterns like bucketing and outlier handling address specific real-world challenges. Schema validation adds structure enforcement when data consistency is critical.

Previous lessons

Back to courses

Next lessons