Why Elasticsearch Text vs. Keyword is the Most Important Choice

|

The decision which field type to use: Text or Keyword in Elasticsearch is very important. One wrong field type silently kills your queries. Here’s exactly what happens under the hood — with examples you can run today.

Elasticsearch
Mapping
Keyword field
Text Field

If you prefer video tutorials, you can watch the full example here showing the differences between Keyword and Text Elasticsearch field types:

The bug that never shows an error

Imagine you’ve just deployed a product catalog search. Users type in “running shoes”, results come back, everything looks fine. Then your analytics team reports something strange: the “Top Brands” aggregation shows only three brands instead of the expected forty. You check the index — all forty brands are there. No errors in the logs. What went wrong?

The culprit, almost certainly, is field type. Specifically: someone mapped the brand field as text when it should have been keyword.

Scenario — The Silent Aggregation Failure

You index a product with "brand": "New Balance". The brand field is mapped as text. Elasticsearch’s analyzer splits it into two tokens: new and balance. Your terms aggregation counts tokens, not original values — so you see new: 847 and balance: 847 in your buckets instead of New Balance: 847. The brand effectively disappears from your facets.

This is the heart of the text vs. keyword problem. It doesn’t throw an exception. It doesn’t refuse to run. It just silently returns wrong results.


What actually happens under the hood

When Elasticsearch indexes a field, it has to decide: should I analyze this string, or store it as-is? That decision determines everything about what queries will work.

Text fields go through an analysis chain: a character filter, a tokenizer, and token filters. By default, the standard analyzer lowercases everything and splits on whitespace and punctuation. The result is a set of tokens stored in an inverted index.

Keyword fields skip all of that. The value is stored exactly as you provided it — case, spaces, punctuation and all — as a single token.

Here’s the same string indexed both ways:

INPUT VALUE
"Brand": "New Balance Running"

text field — tokens produced

new · balance · running

keyword field — token produced

New Balance Running (one token, unchanged)

These two representations answer completely different questions. Understanding which one you need — before you index a single document — is the skill.


Elasticsearch Text field examples: when analysis helps and hurts

Where text shines — product descriptions

Say you’re building a search for an e-commerce site. A product has the description: “Lightweight trail running shoe with responsive foam cushioning”. A user searches for foam cushion shoes. With a text field and full-text matching, this works — the tokens foam and cushion match despite the description saying “cushioning”, and shoe potentially matches “shoes” if you add a stemmer.

MAPPING
{
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

QUERY
{
  "query": {
    "match": {
      "description": "foam cushion shoes"
    }
  }
}

Correct — returns the product

The english analyzer stems “cushioning” → “cushion” and “shoes” → “shoe”, so the tokens align. Full-text search works as users expect.

Would fail with keyword

term query on a keyword field would look for the exact string "foam cushion shoes" — which appears nowhere in the document. Zero results.

Where text breaks — the exact-match trap

Now suppose a developer on your team tries to filter orders by status. The status field was mapped as text because it’s a string. They write:

QUERY (BROKEN)
{
  "query": {
    "term": {
      "status": "Shipped"   // capital S — won't match!
    }
  }
}

The analyzer lowercased “Shipped” to “shipped” at index time. The term query doesn’t analyze the search value — it looks for the token Shipped with a capital S, which doesn’t exist in the index. The query returns nothing, and no error is raised.

Rule of thumb: Never use a term query on a text field. The analyzer transforms values at index time but term queries bypass analysis at search time. The mismatch causes silent failures. Elasticsearch even warns about this in its documentation — but most developers discover it the hard way.

Elasticsearch Keyword field examples: exact matching and aggregations

Aggregations — the primary use case

Keyword fields are the backbone of faceted navigation and analytics. If you want to build a “filter by category” sidebar, the category field must be keyword.

MAPPING
{
  "mappings": {
    "properties": {
      "category": { "type": "keyword" }
    }
  }
}

JavaScript
{
  "aggs": {
    "by_category": {
      "terms": { "field": "category" }
    }
  }
}

With keyword — correct buckets

Trail Running: 234
Road Running: 189
Cross Training: 97

With text — broken buckets

running: 423
trail: 234
road: 189
cross: 97
training: 97

Sorting — another silent failure zone

Sorting on a text field is disabled by default — Elasticsearch will throw an error. But if someone enables fielddata: true to make sorting work on a text field, it sorts by the first token, not the full value. “Apple Watch” and “Apple iPhone” would sort identically because they share the first token apple.

SORT QUERY (only works correctly on keyword fields)
{
  "sort": [
    { "product_name": { "order": "asc" } }
  ]
}

// ✓ keyword field → sorts "Apple iPhone" before "Apple Watch"
// ✗ text field    → sorts by first token only, behavior unpredictable

Exact-match filtering — where keyword is mandatory

Email addresses, user IDs, status codes, ISO country codes, SKUs — any field where the value must match precisely should be keyword. A text field with the standard analyzer would split user@company.com into usercompany, and com. Searching for the full email would find nothing.

Always keyword — these fields
{
  "mappings": {
    "properties": {
      "email":      { "type": "keyword" },
      "order_id":   { "type": "keyword" },
      "status":     { "type": "keyword" },
      "country_iso":{ "type": "keyword" },
      "sku":        { "type": "keyword" }
    }
  }
}

Dynamic mapping: Elasticsearch’s double-edged sword

Here’s where many teams get burned. When you index a document without defining a mapping first, Elasticsearch creates one automatically. For string fields, it makes a “safe” choice: it maps them as both text and keyword using a multi-field. That sounds helpful — until you realize what it costs.

What dynamic mapping creates for a string field
{
  "title": {
    "type": "text",
    "fields": {
      "keyword": {
        "type":         "keyword",
        "ignore_above": 256
      }
    }
  }
}

The hidden tax: every string field now consumes roughly double the storage and memory, because it’s indexed twice — once as analyzed tokens, once as a raw keyword. On a small index this is irrelevant. On a billion-document index this is the difference between a cluster that fits in memory and one that constantly evicts segment data and slows to a crawl.

Production advice: Always define explicit mappings before indexing production data. Dynamic mapping is fine for exploring data during development, but letting it run in production means you’ll inherit the storage overhead and lose the chance to make deliberate choices about which fields actually need both representations.

There’s also the ignore_above: 256 trap. Dynamic mapping sets a 256-character limit on the auto-generated keyword sub-field. Any value longer than 256 characters is silently dropped from the keyword index — it still exists in the text field and is searchable via full-text queries, but it won’t show up in aggregations or be sortable. If you have URLs, long product descriptions, or base64-encoded values flowing in, you’ll hit this limit and wonder why aggregations are missing data.


The multi-field trick: having both

What if you genuinely need both — full-text search on a field and the ability to aggregate by its exact value? That’s what multi-fields are for. You can configure them explicitly and avoid the storage waste of having it everywhere by default.

A real-world example: a job board that needs to search job titles by keywords (“senior engineer”, “product manager”) but also wants to show “Top 10 Job Titles” as a chart.

Multi-field mapping for job titles
{
  "mappings": {
    "properties": {
      "job_title": {
        "type": "text",           // full-text search on job_title
        "analyzer": "english",
        "fields": {
          "raw": {
            "type": "keyword"      // exact match + aggs on job_title.raw
          }
        }
      }
    }
  }
}

Now you have two query targets: job_title for full-text, job_title.raw for exact operations.

Using both sub-fields in the same query
{
  "query": {
    "match": {
      "job_title": "senior engineer"   // full-text search
    }
  },
  "aggs": {
    "top_titles": {
      "terms": {
        "field": "job_title.raw",    // exact aggregation
        "size": 10
      }
    }
  }
}

Search results

Returns “Senior Software Engineer”, “Senior Frontend Engineer”, “Senior DevOps Engineer” — matched via tokens.

Aggregation buckets

Senior Software Engineer: 412
Senior Frontend Engineer: 287
Clean, unsplit values.

Quick decision guide: Elasticsearch Keyword vs Text field type

When you’re looking at a new string field, ask yourself three questions:

QuestionIf yes → useExamples
Do users search this field with natural language?textProduct descriptions, article body, reviews, bios
Do you aggregate, sort, or filter this field exactly?keywordStatus, category, country, email, SKU, tags
Do you need both full-text search AND exact aggregation?text + keyword multi-fieldProduct names, job titles, article titles
Is this a URL, email, or other structured identifier?keyworduser@example.com, /products/123, US, de-DE
Will values exceed 256 characters but need aggregation?keyword with higher ignore_aboveLong slugs, encoded values, full names

IMPORTANT!

“Every field type decision is permanent until you reindex. Make it deliberately.”

One final thing worth remembering: you cannot change a field’s type after the fact without reindexing your entire index. This is why the choice matters so much upfront. In a small development index, reindexing takes seconds. In production, with hundreds of millions of documents, it can take hours — and requires maintaining two indices in parallel during the migration.

The good news is the decision is almost always clear once you know what to ask. Text is for human language. Keyword is for machine-readable values. When you need both, use multi-fields and name your keyword sub-field .raw so future-you remembers why it’s there.

Before you go: audit your dynamic-mapped indices. Run GET /your-index/_mapping and look for string fields that were auto-mapped. For each one, ask whether you actually need the text sub-field, the keyword sub-field, or both — then define an explicit mapping and reindex. Your cluster’s memory will thank you.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x