Elasticsearch Index Performance Tuning

Elasticsearch works well out of the box, but production workloads with high write rates or complex searches demand tuning. These techniques reduce latency, increase throughput, and use memory efficiently.

Write Performance vs Search Performance

Two workloads, opposite needs:

Write-heavy (log ingestion):
  Priority: Index documents fast
  Tradeoff: Slightly slower real-time search

Search-heavy (product search):
  Priority: Low search latency
  Tradeoff: Slower indexing rate is acceptable

Most tuning choices favor one side. Pick the right tradeoffs for your use case.

Tune 1: Bulk Indexing for Write Speed

Sending one document per HTTP request wastes network time. Use the Bulk API to send hundreds or thousands of documents per request.

Comparison:
  1,000 individual requests:  ~12 seconds
  1 bulk request (1,000 docs):  ~0.3 seconds

Rule of thumb: 5–15 MB per bulk request
              5,000–15,000 docs per request (test and adjust)

Tune 2: Refresh Interval

Elasticsearch makes new documents searchable every 1 second by default. This refresh step is expensive — it builds a new segment from recently indexed documents. If you do not need real-time visibility, increase the interval:

# During bulk load — disable refresh entirely
PUT /products/_settings
{
  "refresh_interval": "-1"
}

# After load — restore to default
PUT /products/_settings
{
  "refresh_interval": "1s"
}

# For near-real-time (5 seconds is fine for most dashboards)
PUT /logs/_settings
{
  "refresh_interval": "5s"
}

Disabling refresh during a bulk load can make indexing 2 to 5 times faster.

Tune 3: Disable Replicas During Initial Load

Every document written to a primary shard is also sent to replica shards. During a one-time data load, disable replicas to halve the write I/O:

Before load:
PUT /products/_settings
{
  "number_of_replicas": 0
}

After load — re-enable replicas:
PUT /products/_settings
{
  "number_of_replicas": 1
}

Elasticsearch rebuilds replicas from the primary automatically. The cluster returns to green status within minutes.

Tune 4: Segment Merging

Elasticsearch stores documents in small files called segments. Over time, many small segments slow down searches. Elasticsearch merges segments automatically in the background, but you can trigger a merge manually after a large load:

POST /products/_forcemerge?max_num_segments=1

Merging into 1 segment gives the fastest possible search speed — all data is in one contiguous file. Run this only on indexes that receive no new writes (like historical log indexes).

Tune 5: Heap Size

Elasticsearch runs inside a Java Virtual Machine and uses heap memory for caching filters, sorting data, and running aggregations. Set heap size to half of available RAM, but no more than 30 GB:

Server has 64 GB RAM:
  Elasticsearch heap: 30 GB  (set -Xms30g -Xmx30g)
  Operating system:   34 GB  (for disk caching, file system)

Server has 16 GB RAM:
  Elasticsearch heap: 8 GB
  Operating system:   8 GB

Set minimum (-Xms) and maximum (-Xmx) heap to the same value. This prevents Elasticsearch from spending time expanding the heap under load.

Tune 6: Shard Request Cache

Elasticsearch caches the results of entire search requests at the shard level. Repeated identical queries return from cache without touching the disk:

GET /products/_search?request_cache=true
{
  "query": { "term": { "category": "electronics" } },
  "size": 0,
  "aggs": { ... }
}

The cache is invalidated automatically when the shard's data changes. Aggregation-only queries (size: 0) benefit most since the result does not change between refreshes.

Tune 7: Mapping Optimization

Disable features you do not use to save memory and CPU:

# Disable _source if you only aggregate and never retrieve documents
"_source": { "enabled": false }

# Disable norms for fields not used in relevance scoring
"description": {
  "type": "text",
  "norms": false      <-- saves ~1 byte per document per field
}

# Use doc_values: false for text fields not used in sorting/aggregations
"comment_text": {
  "type": "keyword",
  "doc_values": false   <-- saves disk if you only search this field
}

Tune 8: Index Lifecycle Management (ILM)

For time-series data like logs, use ILM to automatically transition indexes through phases based on age — keeping hot recent data on fast SSDs and moving old data to cheaper storage or deleting it:

ILM Policy phases:

Hot Phase (0–7 days):
  Stored on: SSD nodes
  Shard size: active rollover when shard hits 50 GB
  Replicas: 1

Warm Phase (7–30 days):
  Stored on: slower data nodes
  Merge to 1 segment: yes
  Replicas: 1

Cold Phase (30–90 days):
  Stored on: cheapest nodes
  Replicas: 0

Delete Phase (after 90 days):
  Index deleted automatically

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot":    { "min_age": "0ms", "actions": { "rollover": { "max_size": "50gb" } } },
      "warm":   { "min_age": "7d",  "actions": { "forcemerge": { "max_num_segments": 1 } } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Previous lesson

Back to course

Next lesson