GCP Cloud Bigtable

Cloud Bigtable is GCP's fully managed, petabyte-scale NoSQL wide-column database. It is the same database that powers Google Search indexing, Google Maps, and Gmail — handling millions of reads and writes per second with single-digit millisecond latency. Bigtable is optimized for workloads that require extremely high throughput on large datasets, such as time-series data, IoT telemetry, financial market data, and user analytics.

Think of Bigtable like an enormous spreadsheet with billions of rows and millions of columns. Every cell in this spreadsheet is identified by a row key, column family, column qualifier, and timestamp. Lookups by row key are blazingly fast — Google has been using this system since 2004.

When to Use Bigtable

Good Fit	Not a Good Fit
Time-series data (IoT sensors, metrics)	Data requiring complex SQL joins
Storing billions of rows	Datasets under 1 TB
Very high write throughput (>10,000 writes/sec)	Strongly consistent multi-row transactions
Real-time personalization (ad targeting)	Ad-hoc SQL analytics (use BigQuery instead)
Financial tick data	Small OLTP applications

Bigtable Data Model

Bigtable's data model is fundamentally different from both relational databases and document stores.

Row Key            │ Column Family: weather    │ Column Family: sensors
                   │ temp  │ humidity │ wind   │ pressure │ uv_index
───────────────────┼───────┼──────────┼────────┼──────────┼──────────
delhi#2024-01-15   │ 18.5  │ 72       │ 12.3   │ 1012.5   │ 3.2
delhi#2024-01-16   │ 20.1  │ 68       │ 8.7    │ 1010.0   │ 4.1
mumbai#2024-01-15  │ 28.3  │ 85       │ 15.0   │ 1008.3   │ 5.5
mumbai#2024-01-16  │ 29.0  │ 87       │ 10.2   │ 1007.8   │ 6.0

Key components of the data model:

Row Key: The unique identifier for each row. All queries are performed by row key. Designing the row key well is the most important Bigtable decision.
Column Family: A group of related columns defined at table creation. Columns within a family are stored together on disk.
Column Qualifier: The name of a specific column within a family. Column qualifiers can be created on the fly — no schema migration needed.
Cell: The intersection of a row key, column family, and column qualifier. Each cell can store multiple timestamped versions.

Row Key Design – The Most Critical Decision

Bigtable sorts all rows lexicographically by row key. Range scans read consecutive row keys efficiently. A well-designed row key allows efficient queries; a poor row key causes hotspots (all traffic hitting one server) and slow scans.

Bad Row Key Design (timestamp-first → hotspot):
Row Key: 2024-01-15T10:30:00#delhi#temperature
         (All new writes go to the same server — hotspot ✗)

Good Row Key Design (location-first → distributed writes):
Row Key: delhi#2024-01-15T10:30:00
         (Writes spread across servers by location ✓)
         (Range scan: "give me all delhi data for Jan 15" is efficient ✓)

Creating a Bigtable Instance

# Create a Bigtable instance (SSD, single-node for development)
gcloud bigtable instances create my-bigtable \
  --display-name="My Bigtable Instance" \
  --cluster-config=id=my-cluster,zone=us-central1-a,nodes=1 \
  --instance-type=PRODUCTION

# Create a table with two column families
cbt -instance=my-bigtable createtable weather-data
cbt -instance=my-bigtable createfamily weather-data weather
cbt -instance=my-bigtable createfamily weather-data sensors

Reading and Writing Data with cbt CLI

# Write rows
cbt -instance=my-bigtable set weather-data \
  "delhi#2024-01-15" weather:temp=18.5 weather:humidity=72

cbt -instance=my-bigtable set weather-data \
  "mumbai#2024-01-15" weather:temp=28.3 weather:humidity=85

# Read a specific row
cbt -instance=my-bigtable read weather-data \
  start="delhi#2024-01-15" end="delhi#2024-01-15~"

# Read a range of rows (all delhi data)
cbt -instance=my-bigtable read weather-data \
  start="delhi#" end="delhi#~"

# Read with column filter
cbt -instance=my-bigtable read weather-data \
  columns="weather:temp"

Python Client

from google.cloud import bigtable
from google.cloud.bigtable import column_family, row_filters

project_id = "my-project"
instance_id = "my-bigtable"
table_id = "weather-data"

client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id)

# --- Write rows ---
row_key = b"delhi#2024-01-15"
row = table.direct_row(row_key)
row.set_cell("weather", "temp",     "18.5")
row.set_cell("weather", "humidity", "72")
row.set_cell("sensors", "pressure", "1012.5")
row.commit()

# --- Read a single row ---
row = table.read_row(b"delhi#2024-01-15")
if row:
    temp = row.cells["weather"][b"temp"][0].value.decode("utf-8")
    print(f"Delhi temperature: {temp}°C")

# --- Read a range of rows ---
rows = table.read_rows(
    start_key=b"delhi#2024-01-",
    end_key=b"delhi#2024-02-"
)
for row in rows:
    print(row.row_key.decode("utf-8"), row.cells)

Bigtable Cluster Scaling

Bigtable scales horizontally by adding nodes (cluster nodes) to a cluster. Each node adds approximately 10,000 QPS of read/write capacity and 14 TB of SSD storage.

# Scale from 1 to 5 nodes (takes about 2 minutes, no downtime)
gcloud bigtable clusters update my-cluster \
  --instance=my-bigtable \
  --num-nodes=5

Scaling Diagram:
1 Node:  ~10,000 QPS,  ~14 TB
3 Nodes: ~30,000 QPS,  ~42 TB
5 Nodes: ~50,000 QPS,  ~70 TB
10 Nodes: ~100,000 QPS, ~140 TB

Replication

Bigtable supports multi-cluster replication across regions for high availability and disaster recovery. Writes go to all clusters; reads can be served from the nearest cluster.

Cluster 1: us-central1 (primary writes + reads)
        │
        │ Async replication
        ▼
Cluster 2: europe-west1 (serves reads for European users)

Garbage Collection Policies

Every cell in Bigtable can store multiple versions (timestamps). Garbage collection rules automatically delete old versions to save storage.

# Keep only the 3 most recent versions of each cell
cbt -instance=my-bigtable setgcpolicy weather-data \
  weather maxversions=3

# Keep only versions newer than 30 days
cbt -instance=my-bigtable setgcpolicy weather-data \
  sensors maxage=30d

Bigtable vs Other GCP Databases

Database	Best For	Query Model
Cloud SQL	Transactional apps (OLTP)	Full SQL with joins
Cloud Spanner	Global OLTP at massive scale	Full SQL, global transactions
Firestore	Mobile/web apps, real-time sync	Document queries, real-time listeners
Bigtable	High-throughput time-series, IoT	Row key lookups and range scans only
BigQuery	Analytics, reporting (OLAP)	Full SQL, optimized for full table scans

Key Takeaways

Bigtable is a petabyte-scale, wide-column NoSQL database built for very high throughput and low latency.
All data is accessed via the row key — there are no secondary indexes or SQL joins.
Row key design is the most important architectural decision in Bigtable.
Scale linearly by adding nodes to a cluster — no downtime required.
Garbage collection policies automatically remove old cell versions.
Best suited for time-series, IoT, and analytics ingestion — not for general-purpose applications.

Previous lesson

Back to course

Next lesson