GCP Cloud Bigtable
Cloud Bigtable is GCP's fully managed, petabyte-scale NoSQL wide-column database. It is the same database that powers Google Search indexing, Google Maps, and Gmail — handling millions of reads and writes per second with single-digit millisecond latency. Bigtable is optimized for workloads that require extremely high throughput on large datasets, such as time-series data, IoT telemetry, financial market data, and user analytics.
Think of Bigtable like an enormous spreadsheet with billions of rows and millions of columns. Every cell in this spreadsheet is identified by a row key, column family, column qualifier, and timestamp. Lookups by row key are blazingly fast — Google has been using this system since 2004.
When to Use Bigtable
| Good Fit | Not a Good Fit |
|---|---|
| Time-series data (IoT sensors, metrics) | Data requiring complex SQL joins |
| Storing billions of rows | Datasets under 1 TB |
| Very high write throughput (>10,000 writes/sec) | Strongly consistent multi-row transactions |
| Real-time personalization (ad targeting) | Ad-hoc SQL analytics (use BigQuery instead) |
| Financial tick data | Small OLTP applications |
Bigtable Data Model
Bigtable's data model is fundamentally different from both relational databases and document stores.
Row Key │ Column Family: weather │ Column Family: sensors
│ temp │ humidity │ wind │ pressure │ uv_index
───────────────────┼───────┼──────────┼────────┼──────────┼──────────
delhi#2024-01-15 │ 18.5 │ 72 │ 12.3 │ 1012.5 │ 3.2
delhi#2024-01-16 │ 20.1 │ 68 │ 8.7 │ 1010.0 │ 4.1
mumbai#2024-01-15 │ 28.3 │ 85 │ 15.0 │ 1008.3 │ 5.5
mumbai#2024-01-16 │ 29.0 │ 87 │ 10.2 │ 1007.8 │ 6.0
Key components of the data model:
- Row Key: The unique identifier for each row. All queries are performed by row key. Designing the row key well is the most important Bigtable decision.
- Column Family: A group of related columns defined at table creation. Columns within a family are stored together on disk.
- Column Qualifier: The name of a specific column within a family. Column qualifiers can be created on the fly — no schema migration needed.
- Cell: The intersection of a row key, column family, and column qualifier. Each cell can store multiple timestamped versions.
Row Key Design – The Most Critical Decision
Bigtable sorts all rows lexicographically by row key. Range scans read consecutive row keys efficiently. A well-designed row key allows efficient queries; a poor row key causes hotspots (all traffic hitting one server) and slow scans.
Bad Row Key Design (timestamp-first → hotspot):
Row Key: 2024-01-15T10:30:00#delhi#temperature
(All new writes go to the same server — hotspot ✗)
Good Row Key Design (location-first → distributed writes):
Row Key: delhi#2024-01-15T10:30:00
(Writes spread across servers by location ✓)
(Range scan: "give me all delhi data for Jan 15" is efficient ✓)
Creating a Bigtable Instance
# Create a Bigtable instance (SSD, single-node for development) gcloud bigtable instances create my-bigtable \ --display-name="My Bigtable Instance" \ --cluster-config=id=my-cluster,zone=us-central1-a,nodes=1 \ --instance-type=PRODUCTION # Create a table with two column families cbt -instance=my-bigtable createtable weather-data cbt -instance=my-bigtable createfamily weather-data weather cbt -instance=my-bigtable createfamily weather-data sensors
Reading and Writing Data with cbt CLI
# Write rows cbt -instance=my-bigtable set weather-data \ "delhi#2024-01-15" weather:temp=18.5 weather:humidity=72 cbt -instance=my-bigtable set weather-data \ "mumbai#2024-01-15" weather:temp=28.3 weather:humidity=85 # Read a specific row cbt -instance=my-bigtable read weather-data \ start="delhi#2024-01-15" end="delhi#2024-01-15~" # Read a range of rows (all delhi data) cbt -instance=my-bigtable read weather-data \ start="delhi#" end="delhi#~" # Read with column filter cbt -instance=my-bigtable read weather-data \ columns="weather:temp"
Python Client
from google.cloud import bigtable
from google.cloud.bigtable import column_family, row_filters
project_id = "my-project"
instance_id = "my-bigtable"
table_id = "weather-data"
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)
# --- Write rows ---
row_key = b"delhi#2024-01-15"
row = table.direct_row(row_key)
row.set_cell("weather", "temp", "18.5")
row.set_cell("weather", "humidity", "72")
row.set_cell("sensors", "pressure", "1012.5")
row.commit()
# --- Read a single row ---
row = table.read_row(b"delhi#2024-01-15")
if row:
temp = row.cells["weather"][b"temp"][0].value.decode("utf-8")
print(f"Delhi temperature: {temp}°C")
# --- Read a range of rows ---
rows = table.read_rows(
start_key=b"delhi#2024-01-",
end_key=b"delhi#2024-02-"
)
for row in rows:
print(row.row_key.decode("utf-8"), row.cells)
Bigtable Cluster Scaling
Bigtable scales horizontally by adding nodes (cluster nodes) to a cluster. Each node adds approximately 10,000 QPS of read/write capacity and 14 TB of SSD storage.
# Scale from 1 to 5 nodes (takes about 2 minutes, no downtime) gcloud bigtable clusters update my-cluster \ --instance=my-bigtable \ --num-nodes=5
Scaling Diagram: 1 Node: ~10,000 QPS, ~14 TB 3 Nodes: ~30,000 QPS, ~42 TB 5 Nodes: ~50,000 QPS, ~70 TB 10 Nodes: ~100,000 QPS, ~140 TB
Replication
Bigtable supports multi-cluster replication across regions for high availability and disaster recovery. Writes go to all clusters; reads can be served from the nearest cluster.
Cluster 1: us-central1 (primary writes + reads)
│
│ Async replication
▼
Cluster 2: europe-west1 (serves reads for European users)
Garbage Collection Policies
Every cell in Bigtable can store multiple versions (timestamps). Garbage collection rules automatically delete old versions to save storage.
# Keep only the 3 most recent versions of each cell cbt -instance=my-bigtable setgcpolicy weather-data \ weather maxversions=3 # Keep only versions newer than 30 days cbt -instance=my-bigtable setgcpolicy weather-data \ sensors maxage=30d
Bigtable vs Other GCP Databases
| Database | Best For | Query Model |
|---|---|---|
| Cloud SQL | Transactional apps (OLTP) | Full SQL with joins |
| Cloud Spanner | Global OLTP at massive scale | Full SQL, global transactions |
| Firestore | Mobile/web apps, real-time sync | Document queries, real-time listeners |
| Bigtable | High-throughput time-series, IoT | Row key lookups and range scans only |
| BigQuery | Analytics, reporting (OLAP) | Full SQL, optimized for full table scans |
Key Takeaways
- Bigtable is a petabyte-scale, wide-column NoSQL database built for very high throughput and low latency.
- All data is accessed via the row key — there are no secondary indexes or SQL joins.
- Row key design is the most important architectural decision in Bigtable.
- Scale linearly by adding nodes to a cluster — no downtime required.
- Garbage collection policies automatically remove old cell versions.
- Best suited for time-series, IoT, and analytics ingestion — not for general-purpose applications.
