GenAI Embeddings and Vector Databases
Embeddings and vector databases are the infrastructure layer underneath RAG systems, semantic search, and recommendation engines. Understanding them is essential for building any application where AI needs to find meaning in large collections of text, images, or data.
What Is an Embedding?
An embedding is a list of numbers — called a vector — that represents the meaning of a piece of text, an image, or any other data. Similar things produce vectors that are numerically close to each other. Different things produce vectors that are far apart.
Text → Embedding Model → Vector (list of numbers) "Dog" → [0.82, 0.14, 0.67, 0.33, ...] "Puppy" → [0.80, 0.16, 0.65, 0.31, ...] ← close to "Dog" "Car" → [0.12, 0.89, 0.04, 0.77, ...] ← far from "Dog" "Automobile"→ [0.13, 0.88, 0.05, 0.78, ...] ← close to "Car"
In practice, embedding vectors have hundreds or thousands of dimensions — not just 4. But the principle is the same: meaning maps to position in vector space.
Why Embeddings Enable Semantic Search
Traditional keyword search finds documents containing exact words from the query. Embedding-based semantic search finds documents with similar meaning — even if no words match exactly.
Query: "How do I fix a leaking pipe?" Keyword search results: Only finds documents containing the words "fix" AND "leaking" AND "pipe" Semantic search results: Finds documents about: - "Repairing burst plumbing" ← different words, same meaning - "Stopping water drip in walls" ← no keyword match, highly relevant - "Plumbing repair guide" ← relevant but no word match
How Embedding Models Work
An embedding model is a neural network trained to place similar texts close together in vector space. During training, the model learns by seeing millions of examples of similar and dissimilar texts. Popular embedding models include:
| Embedding Model | Creator | Notes |
|---|---|---|
| text-embedding-3-large | OpenAI | Strong general-purpose, 3072 dimensions |
| text-embedding-ada-002 | OpenAI | Fast, widely used, 1536 dimensions |
| all-MiniLM-L6-v2 | SBERT (open-source) | Lightweight, fast, runs locally |
| embed-english-v3.0 | Cohere | Optimized for search and retrieval tasks |
| mxbai-embed-large | MixedBread AI (open-source) | State-of-the-art open-source option |
What Is a Vector Database?
A vector database stores embedding vectors and enables fast similarity search across millions of them. When a query arrives, the database finds the vectors closest to the query vector — measured by a distance metric — and returns the corresponding documents.
Vector Database Structure ────────────────────────────────────────────────────────────── ID | Text (chunk) | Vector ──────────────────────────────────────────────────────────── 001 | "Refund policy for digital..." | [0.81, 0.14, 0.67...] 002 | "Shipping takes 3–5 days..." | [0.22, 0.77, 0.31...] 003 | "Cancel subscription anytime..."| [0.79, 0.18, 0.70...] ... ────────────────────────────────────────────────────────────── Query: "How do I get a refund?" Query vector: [0.80, 0.15, 0.68...] Nearest vectors: 001 (refund policy) → 003 (cancel subscription) Returns: Chunks 001 and 003 as the most relevant results ──────────────────────────────────────────────────────────────
Distance Metrics for Vector Similarity
| Metric | What It Measures | When to Use |
|---|---|---|
| Cosine Similarity | Angle between vectors — direction of meaning | Most text similarity tasks (most common) |
| Euclidean Distance | Straight-line distance between vectors | Image embeddings, physical data |
| Dot Product | Magnitude + direction combined | When scale of the vector matters |
Popular Vector Databases
| Database | Type | Key Strength |
|---|---|---|
| Pinecone | Managed cloud | Fully managed, scales easily, production-ready |
| Weaviate | Open-source / cloud | Multimodal search, hybrid keyword + vector |
| Qdrant | Open-source / cloud | Fast, memory-efficient, rich filtering |
| Chroma | Open-source | Developer-friendly, great for local prototyping |
| pgvector | PostgreSQL extension | Add vector search to an existing Postgres database |
| Milvus | Open-source | High-performance, designed for billion-scale data |
Embedding and Vector Database Pipeline
INDEXING (one-time setup):
────────────────────────────────────────────────────────────────
Raw Documents
│
▼
Chunk into 200–500 word segments
│
▼
Pass each chunk through Embedding Model → Vector
│
▼
Store (chunk text + vector + metadata) in Vector Database
────────────────────────────────────────────────────────────────
QUERY (every user request):
────────────────────────────────────────────────────────────────
User Query: "How do I cancel my subscription?"
│
▼
Embed the query using same Embedding Model → Query Vector
│
▼
Vector DB computes similarity of query vector vs all stored vectors
│
▼
Return top K most similar chunks
│
▼
Inject chunks into LLM prompt → Generate grounded response
────────────────────────────────────────────────────────────────
Metadata Filtering
Vector databases support metadata filtering — searching within a specific subset of documents before applying similarity search. This improves precision significantly.
Without filtering: Search all 500,000 document chunks → May retrieve irrelevant old documents With metadata filtering: Filter: WHERE department = "HR" AND year = 2024 Then search: Top 5 most similar chunks within those filtered results → Precise, fast, and relevant
Embeddings Beyond Text
Embeddings work for any type of data, not just text:
- Image embeddings: CLIP converts images into vectors — enables searching images with text descriptions
- Audio embeddings: Sound clips converted to vectors for music similarity and audio search
- Code embeddings: Represents code semantics for code search and duplicate detection
- Multimodal embeddings: Combines text and image into a shared vector space
With embeddings and vector databases understood, the next major topic explores the systems that combine retrieval, LLMs, and tools into autonomous agents — AI systems that can plan and take actions in the world without step-by-step human instruction.
