LangChain Embeddings and Vector Stores
You have loaded and split your documents into chunks. Now you need a way to find the right chunks when a user asks a question. Traditional keyword search finds chunks that contain the exact words the user typed. But if a user asks "What is the refund policy?" and your document says "We accept returns within 30 days," a keyword search misses the answer completely because "refund" does not appear in the text. Embeddings and Vector Stores enable semantic search — finding chunks based on meaning rather than exact words.
The Flavor Map Analogy
Imagine a map where every food is placed based on its flavor profile. Sweet foods cluster in one corner, salty ones in another, spicy ones in a third. Foods with similar flavors sit near each other on the map. "Mango" and "pineapple" sit close together. "Chilli" and "jalapeño" sit near each other. "Mango" and "chilli" sit far apart. Embeddings work the same way — they convert text into a position on a map (a multi-dimensional space) where texts with similar meanings sit close together. Finding related text means finding nearby positions on this map.
Text to Position (Embedding): "What is the return policy?" → [0.23, -0.45, 0.87, 0.12, ...] "We accept returns within 30 days" → [0.21, -0.43, 0.85, 0.14, ...] "The capital of France is Paris" → [-0.67, 0.33, -0.21, 0.88, ...] First two are close together (similar meaning). Third is far away (different topic entirely).
The numbers are the position of the text on a multi-dimensional map. Texts about the same topic have similar position numbers. This is the core idea behind semantic search.
What Is an Embedding Model
An embedding model is a specialized AI that converts text into a list of numbers (called a vector). Unlike a language model that generates text, an embedding model only converts text to numbers and back — it never generates sentences. The numbers capture the semantic meaning of the text in a way that supports mathematical comparison.
Embedding Model Input/Output: Input: "The dog chased the ball." Output: [0.12, -0.34, 0.78, 0.23, -0.56, ...] (768 or 1536 numbers) Input: "A puppy ran after a toy." Output: [0.11, -0.32, 0.76, 0.22, -0.54, ...] (very similar numbers) Input: "Tax returns are due in April." Output: [-0.45, 0.67, -0.23, 0.89, 0.12, ...] (very different numbers)
Setting Up OpenAI Embeddings
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
load_dotenv()
# Create the embedding model
embeddings_model = OpenAIEmbeddings(model="text-embedding-ada-002")
# Embed a single query
query_vector = embeddings_model.embed_query("What is the refund policy?")
print(f"Vector length: {len(query_vector)}") # 1536 numbers
print(f"First 5 numbers: {query_vector[:5]}")
# Embed multiple documents at once
texts = [
"We accept returns within 30 days.",
"Contact support at help@company.com",
"Free shipping on orders over $50"
]
doc_vectors = embeddings_model.embed_documents(texts)
print(f"Embedded {len(doc_vectors)} documents")
The text-embedding-ada-002 model produces 1536-dimensional vectors. Each number in the vector captures a different aspect of the text's meaning. You never need to interpret individual numbers — you only compare vectors to each other.
What Is a Vector Store
A Vector Store is a database optimized for storing and searching vectors. When you store your document chunks in a vector store, each chunk gets converted to a vector and stored alongside the original text and metadata. When a user asks a question, the question gets converted to a vector and the store finds the chunks whose vectors are most similar to the question vector — these are the semantically relevant chunks.
Building the Vector Store:
Chunk 1: "Returns accepted within 30 days" → [0.21, -0.43, ...]
Chunk 2: "Free shipping on orders over $50" → [0.56, 0.12, ...]
Chunk 3: "Contact us at help@company.com" → [-0.34, 0.67, ...]
↓ stored in vector database ↓
Vector Store:
┌─────────────────────────────────────────────────────┐
│ Vector [0.21,-0.43,...] ←→ "Returns accepted..." │
│ Vector [0.56,0.12,...] ←→ "Free shipping..." │
│ Vector [-0.34,0.67,...] ←→ "Contact us..." │
└─────────────────────────────────────────────────────┘
Searching the Vector Store:
Query: "What is the refund policy?"
→ Embed query: [0.20, -0.41, ...]
→ Compare to all stored vectors
→ [0.21,-0.43,...] is most similar ← MATCH
→ Return "Returns accepted within 30 days"
FAISS: Fast Local Vector Store
FAISS (Facebook AI Similarity Search) is a high-performance vector search library that runs entirely on your local machine. No internet connection, no API costs, no data privacy concerns. It stores everything in memory (or on disk), making it perfect for development, small-to-medium datasets, and privacy-sensitive applications.
pip install faiss-cpu
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
# Step 1: Load documents
loader = TextLoader("company_faq.txt")
documents = loader.load()
# Step 2: Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)
# Step 3: Create embeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
# Step 4: Build the vector store
# This embeds all chunks and stores them in FAISS
vector_store = FAISS.from_documents(chunks, embeddings)
print(f"Vector store built with {len(chunks)} chunks")
Searching the Vector Store
# Similarity search: find the most relevant chunks for a query
results = vector_store.similarity_search(
query="What is your return policy?",
k=3 # Return the top 3 most relevant chunks
)
for i, doc in enumerate(results):
print(f"Result {i+1}:")
print(f" Content: {doc.page_content[:200]}")
print(f" Source: {doc.metadata.get('source', 'unknown')}")
print()
Search with Relevance Scores
# Get results with similarity scores (lower distance = more similar)
results_with_scores = vector_store.similarity_search_with_score(
query="How do I contact customer support?",
k=5
)
for doc, score in results_with_scores:
print(f"Score: {score:.4f} | Content: {doc.page_content[:100]}")
# Score close to 0 means very similar
# Score above 1.5 means probably not relevant
Saving and Loading FAISS Vector Stores
Building the vector store requires calling the embedding API for every chunk. For large document sets, this costs money and takes time. Save the vector store to disk so you build it once and reuse it many times.
# Save to disk
vector_store.save_local("my_vector_store")
print("Vector store saved!")
# Load from disk (skip embedding API calls)
loaded_store = FAISS.load_local(
"my_vector_store",
embeddings,
allow_dangerous_deserialization=True # Required flag for loading
)
print("Vector store loaded!")
# Search works identically on the loaded store
results = loaded_store.similarity_search("refund policy", k=3)
Adding New Documents to an Existing Store
# Create initial store
vector_store = FAISS.from_documents(initial_chunks, embeddings)
# Add new documents later without rebuilding
new_chunks = splitter.split_documents(new_documents)
vector_store.add_documents(new_chunks)
# Save the updated store
vector_store.save_local("my_vector_store")
print(f"Store now has documents from {len(initial_chunks) + len(new_chunks)} chunks")
Chroma: Persistent Vector Store with Filtering
Chroma is another popular local vector store with built-in persistence and metadata filtering. It stores data in a folder that persists across application restarts without needing explicit save/load calls.
pip install chromadb
from langchain_community.vectorstores import Chroma
# Create or load a persistent Chroma store
vector_store = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db" # Data saves here automatically
)
# Chroma saves to disk automatically — no manual save() call needed
# Metadata filtering: only search chunks from a specific source
results = vector_store.similarity_search(
query="pricing plans",
k=3,
filter={"source": "pricing_page.txt"} # Only search this source
)
Metadata filtering is a powerful feature. In a multi-document store with product manuals, pricing guides, and support articles, you can restrict searches to only the pricing documents when the user asks about pricing. This improves accuracy significantly.
Pinecone: Cloud Vector Store for Production
FAISS and Chroma work on your local machine. For production applications with millions of documents, multiple servers, and high query volumes, you need a cloud vector database. Pinecone is the most popular option.
pip install pinecone-client langchain-pinecone
from langchain_pinecone import PineconeVectorStore
import os
# Pinecone requires an API key from pinecone.io
os.environ["PINECONE_API_KEY"] = "your-pinecone-key"
# Store documents in a Pinecone index
vector_store = PineconeVectorStore.from_documents(
documents=chunks,
embedding=embeddings,
index_name="my-langchain-index"
)
Retriever: The Interface for Searching
Vector stores have a .as_retriever() method that converts them into a Retriever object. Retrievers are what you use inside LangChain chains. They accept a query string and return relevant documents.
# Convert vector store to retriever
retriever = vector_store.as_retriever(
search_type="similarity", # or "mmr" for diversity
search_kwargs={"k": 4} # Return top 4 results
)
# Use the retriever directly
docs = retriever.invoke("What are your business hours?")
for doc in docs:
print(doc.page_content[:150])
MMR Search: Maximum Marginal Relevance
Standard similarity search returns the top K most similar chunks. If multiple chunks from the same section of a document are all very similar to the query, they all rank high but contain redundant information. MMR search balances relevance with diversity — it still finds relevant chunks but ensures they cover different aspects of the topic.
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 4, "fetch_k": 20}
# fetch_k: candidates considered, k: final results returned
)
Full Pipeline: Load → Split → Embed → Store → Search
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
load_dotenv()
def build_searchable_knowledge_base(pdf_path: str) -> FAISS:
print("Step 1: Loading document...")
loader = PyPDFLoader(pdf_path)
documents = loader.load()
print(f" {len(documents)} pages loaded")
print("Step 2: Splitting into chunks...")
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=150)
chunks = splitter.split_documents(documents)
print(f" {len(chunks)} chunks created")
print("Step 3: Building vector store (calls embedding API)...")
embeddings = OpenAIEmbeddings()
store = FAISS.from_documents(chunks, embeddings)
print(" Vector store ready")
print("Step 4: Saving to disk...")
store.save_local("knowledge_base")
print(" Saved!")
return store
# Build it once
store = build_searchable_knowledge_base("product_manual.pdf")
# Search it
query = "How do I reset the device to factory settings?"
results = store.similarity_search(query, k=3)
print(f"\nResults for: '{query}'")
for i, doc in enumerate(results):
print(f"\n[{i+1}] Page {doc.metadata.get('page',0)+1}: {doc.page_content[:200]}")
Vector Store Comparison
Store Location Persistence Scale Best For ────────────────────────────────────────────────────────────── FAISS Local Manual save Millions Dev + privacy Chroma Local Auto save Millions Dev + filtering Pinecone Cloud Always on Billions Production Weaviate Cloud/Self Always on Billions Production Qdrant Cloud/Self Always on Billions Production
Summary
Embeddings convert text into numerical vectors where similar meanings produce similar numbers. Embedding models (like OpenAI's text-embedding-ada-002) handle this conversion. Vector Stores keep embeddings organized and enable fast similarity search. FAISS works locally for development. Chroma adds persistence and metadata filtering. Pinecone scales to production. The Retriever interface wraps vector stores for use inside LangChain chains. MMR search provides diverse results when multiple chunks cover the same topic.
