GenAI Embeddings and Vector Databases

Embeddings and vector databases are the infrastructure layer underneath RAG systems, semantic search, and recommendation engines. Understanding them is essential for building any application where AI needs to find meaning in large collections of text, images, or data.

What Is an Embedding?

An embedding is a list of numbers — called a vector — that represents the meaning of a piece of text, an image, or any other data. Similar things produce vectors that are numerically close to each other. Different things produce vectors that are far apart.

Text → Embedding Model → Vector (list of numbers)

"Dog"       → [0.82, 0.14, 0.67, 0.33, ...]
"Puppy"     → [0.80, 0.16, 0.65, 0.31, ...]  ← close to "Dog"
"Car"       → [0.12, 0.89, 0.04, 0.77, ...]  ← far from "Dog"
"Automobile"→ [0.13, 0.88, 0.05, 0.78, ...]  ← close to "Car"

In practice, embedding vectors have hundreds or thousands of dimensions — not just 4. But the principle is the same: meaning maps to position in vector space.

Why Embeddings Enable Semantic Search

Traditional keyword search finds documents containing exact words from the query. Embedding-based semantic search finds documents with similar meaning — even if no words match exactly.

Query: "How do I fix a leaking pipe?"

Keyword search results:
  Only finds documents containing the words "fix" AND "leaking" AND "pipe"

Semantic search results:
  Finds documents about:
  - "Repairing burst plumbing"    ← different words, same meaning
  - "Stopping water drip in walls" ← no keyword match, highly relevant
  - "Plumbing repair guide"        ← relevant but no word match

How Embedding Models Work

An embedding model is a neural network trained to place similar texts close together in vector space. During training, the model learns by seeing millions of examples of similar and dissimilar texts. Popular embedding models include:

Embedding ModelCreatorNotes
text-embedding-3-largeOpenAIStrong general-purpose, 3072 dimensions
text-embedding-ada-002OpenAIFast, widely used, 1536 dimensions
all-MiniLM-L6-v2SBERT (open-source)Lightweight, fast, runs locally
embed-english-v3.0CohereOptimized for search and retrieval tasks
mxbai-embed-largeMixedBread AI (open-source)State-of-the-art open-source option

What Is a Vector Database?

A vector database stores embedding vectors and enables fast similarity search across millions of them. When a query arrives, the database finds the vectors closest to the query vector — measured by a distance metric — and returns the corresponding documents.

Vector Database Structure
──────────────────────────────────────────────────────────────
ID  | Text (chunk)                    | Vector
────────────────────────────────────────────────────────────
001 | "Refund policy for digital..."  | [0.81, 0.14, 0.67...]
002 | "Shipping takes 3–5 days..."    | [0.22, 0.77, 0.31...]
003 | "Cancel subscription anytime..."| [0.79, 0.18, 0.70...]
...
──────────────────────────────────────────────────────────────

Query: "How do I get a refund?"
Query vector: [0.80, 0.15, 0.68...]

Nearest vectors: 001 (refund policy) → 003 (cancel subscription)
Returns: Chunks 001 and 003 as the most relevant results
──────────────────────────────────────────────────────────────

Distance Metrics for Vector Similarity

MetricWhat It MeasuresWhen to Use
Cosine SimilarityAngle between vectors — direction of meaningMost text similarity tasks (most common)
Euclidean DistanceStraight-line distance between vectorsImage embeddings, physical data
Dot ProductMagnitude + direction combinedWhen scale of the vector matters

Popular Vector Databases

DatabaseTypeKey Strength
PineconeManaged cloudFully managed, scales easily, production-ready
WeaviateOpen-source / cloudMultimodal search, hybrid keyword + vector
QdrantOpen-source / cloudFast, memory-efficient, rich filtering
ChromaOpen-sourceDeveloper-friendly, great for local prototyping
pgvectorPostgreSQL extensionAdd vector search to an existing Postgres database
MilvusOpen-sourceHigh-performance, designed for billion-scale data

Embedding and Vector Database Pipeline

INDEXING (one-time setup):
────────────────────────────────────────────────────────────────
Raw Documents
     │
     ▼
Chunk into 200–500 word segments
     │
     ▼
Pass each chunk through Embedding Model → Vector
     │
     ▼
Store (chunk text + vector + metadata) in Vector Database
────────────────────────────────────────────────────────────────

QUERY (every user request):
────────────────────────────────────────────────────────────────
User Query: "How do I cancel my subscription?"
     │
     ▼
Embed the query using same Embedding Model → Query Vector
     │
     ▼
Vector DB computes similarity of query vector vs all stored vectors
     │
     ▼
Return top K most similar chunks
     │
     ▼
Inject chunks into LLM prompt → Generate grounded response
────────────────────────────────────────────────────────────────

Metadata Filtering

Vector databases support metadata filtering — searching within a specific subset of documents before applying similarity search. This improves precision significantly.

Without filtering:
  Search all 500,000 document chunks → May retrieve irrelevant old documents

With metadata filtering:
  Filter: WHERE department = "HR" AND year = 2024
  Then search: Top 5 most similar chunks within those filtered results
  → Precise, fast, and relevant

Embeddings Beyond Text

Embeddings work for any type of data, not just text:

  • Image embeddings: CLIP converts images into vectors — enables searching images with text descriptions
  • Audio embeddings: Sound clips converted to vectors for music similarity and audio search
  • Code embeddings: Represents code semantics for code search and duplicate detection
  • Multimodal embeddings: Combines text and image into a shared vector space

With embeddings and vector databases understood, the next major topic explores the systems that combine retrieval, LLMs, and tools into autonomous agents — AI systems that can plan and take actions in the world without step-by-step human instruction.

Leave a Comment