The decision which field type to use: Text or Keyword in Elasticsearch is very important. One wrong field type silently kills your queries. Here’s exactly what happens under the hood — with examples you can run today.
If you prefer video tutorials, you can watch the full example here showing the differences between Keyword and Text Elasticsearch field types:
Table of Contents
The bug that never shows an error
Imagine you’ve just deployed a product catalog search. Users type in “running shoes”, results come back, everything looks fine. Then your analytics team reports something strange: the “Top Brands” aggregation shows only three brands instead of the expected forty. You check the index — all forty brands are there. No errors in the logs. What went wrong?
The culprit, almost certainly, is field type. Specifically: someone mapped the brand field as text when it should have been keyword.
Scenario — The Silent Aggregation Failure
"brand": "New Balance". The brand field is mapped as text. Elasticsearch’s analyzer splits it into two tokens: new and balance. Your terms aggregation counts tokens, not original values — so you see new: 847 and balance: 847 in your buckets instead of New Balance: 847. The brand effectively disappears from your facets.This is the heart of the text vs. keyword problem. It doesn’t throw an exception. It doesn’t refuse to run. It just silently returns wrong results.
What actually happens under the hood
When Elasticsearch indexes a field, it has to decide: should I analyze this string, or store it as-is? That decision determines everything about what queries will work.
Text fields go through an analysis chain: a character filter, a tokenizer, and token filters. By default, the standard analyzer lowercases everything and splits on whitespace and punctuation. The result is a set of tokens stored in an inverted index.
Keyword fields skip all of that. The value is stored exactly as you provided it — case, spaces, punctuation and all — as a single token.
Here’s the same string indexed both ways:
"Brand": "New Balance Running"text field — tokens produced
new · balance · runningkeyword field — token produced
New Balance Running (one token, unchanged)These two representations answer completely different questions. Understanding which one you need — before you index a single document — is the skill.
Elasticsearch Text field examples: when analysis helps and hurts
Where text shines — product descriptions
Say you’re building a search for an e-commerce site. A product has the description: “Lightweight trail running shoe with responsive foam cushioning”. A user searches for foam cushion shoes. With a text field and full-text matching, this works — the tokens foam and cushion match despite the description saying “cushioning”, and shoe potentially matches “shoes” if you add a stemmer.
{
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "english"
}
}
}
}{
"query": {
"match": {
"description": "foam cushion shoes"
}
}
}Correct — returns the product
english analyzer stems “cushioning” → “cushion” and “shoes” → “shoe”, so the tokens align. Full-text search works as users expect.Would fail with keyword
term query on a keyword field would look for the exact string "foam cushion shoes" — which appears nowhere in the document. Zero results.Where text breaks — the exact-match trap
Now suppose a developer on your team tries to filter orders by status. The status field was mapped as text because it’s a string. They write:
{
"query": {
"term": {
"status": "Shipped" // capital S — won't match!
}
}
}The analyzer lowercased “Shipped” to “shipped” at index time. The term query doesn’t analyze the search value — it looks for the token Shipped with a capital S, which doesn’t exist in the index. The query returns nothing, and no error is raised.
term query on a text field. The analyzer transforms values at index time but term queries bypass analysis at search time. The mismatch causes silent failures. Elasticsearch even warns about this in its documentation — but most developers discover it the hard way.Elasticsearch Keyword field examples: exact matching and aggregations
Aggregations — the primary use case
Keyword fields are the backbone of faceted navigation and analytics. If you want to build a “filter by category” sidebar, the category field must be keyword.
{
"mappings": {
"properties": {
"category": { "type": "keyword" }
}
}
}{
"aggs": {
"by_category": {
"terms": { "field": "category" }
}
}
}With keyword — correct buckets
Trail Running: 234Road Running: 189Cross Training: 97With text — broken buckets
running: 423trail: 234road: 189cross: 97training: 97Sorting — another silent failure zone
Sorting on a text field is disabled by default — Elasticsearch will throw an error. But if someone enables fielddata: true to make sorting work on a text field, it sorts by the first token, not the full value. “Apple Watch” and “Apple iPhone” would sort identically because they share the first token apple.
{
"sort": [
{ "product_name": { "order": "asc" } }
]
}
// ✓ keyword field → sorts "Apple iPhone" before "Apple Watch"
// ✗ text field → sorts by first token only, behavior unpredictableExact-match filtering — where keyword is mandatory
Email addresses, user IDs, status codes, ISO country codes, SKUs — any field where the value must match precisely should be keyword. A text field with the standard analyzer would split user@company.com into user, company, and com. Searching for the full email would find nothing.
{
"mappings": {
"properties": {
"email": { "type": "keyword" },
"order_id": { "type": "keyword" },
"status": { "type": "keyword" },
"country_iso":{ "type": "keyword" },
"sku": { "type": "keyword" }
}
}
}Dynamic mapping: Elasticsearch’s double-edged sword
Here’s where many teams get burned. When you index a document without defining a mapping first, Elasticsearch creates one automatically. For string fields, it makes a “safe” choice: it maps them as both text and keyword using a multi-field. That sounds helpful — until you realize what it costs.
{
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}The hidden tax: every string field now consumes roughly double the storage and memory, because it’s indexed twice — once as analyzed tokens, once as a raw keyword. On a small index this is irrelevant. On a billion-document index this is the difference between a cluster that fits in memory and one that constantly evicts segment data and slows to a crawl.
There’s also the ignore_above: 256 trap. Dynamic mapping sets a 256-character limit on the auto-generated keyword sub-field. Any value longer than 256 characters is silently dropped from the keyword index — it still exists in the text field and is searchable via full-text queries, but it won’t show up in aggregations or be sortable. If you have URLs, long product descriptions, or base64-encoded values flowing in, you’ll hit this limit and wonder why aggregations are missing data.
The multi-field trick: having both
What if you genuinely need both — full-text search on a field and the ability to aggregate by its exact value? That’s what multi-fields are for. You can configure them explicitly and avoid the storage waste of having it everywhere by default.
A real-world example: a job board that needs to search job titles by keywords (“senior engineer”, “product manager”) but also wants to show “Top 10 Job Titles” as a chart.
{
"mappings": {
"properties": {
"job_title": {
"type": "text", // full-text search on job_title
"analyzer": "english",
"fields": {
"raw": {
"type": "keyword" // exact match + aggs on job_title.raw
}
}
}
}
}
}Now you have two query targets: job_title for full-text, job_title.raw for exact operations.
{
"query": {
"match": {
"job_title": "senior engineer" // full-text search
}
},
"aggs": {
"top_titles": {
"terms": {
"field": "job_title.raw", // exact aggregation
"size": 10
}
}
}
}Search results
Aggregation buckets
Senior Software Engineer: 412Senior Frontend Engineer: 287Clean, unsplit values.
Quick decision guide: Elasticsearch Keyword vs Text field type
When you’re looking at a new string field, ask yourself three questions:
| Question | If yes → use | Examples |
|---|---|---|
| Do users search this field with natural language? | text | Product descriptions, article body, reviews, bios |
| Do you aggregate, sort, or filter this field exactly? | keyword | Status, category, country, email, SKU, tags |
| Do you need both full-text search AND exact aggregation? | text + keyword multi-field | Product names, job titles, article titles |
| Is this a URL, email, or other structured identifier? | keyword | user@example.com, /products/123, US, de-DE |
| Will values exceed 256 characters but need aggregation? | keyword with higher ignore_above | Long slugs, encoded values, full names |
IMPORTANT!
One final thing worth remembering: you cannot change a field’s type after the fact without reindexing your entire index. This is why the choice matters so much upfront. In a small development index, reindexing takes seconds. In production, with hundreds of millions of documents, it can take hours — and requires maintaining two indices in parallel during the migration.
The good news is the decision is almost always clear once you know what to ask. Text is for human language. Keyword is for machine-readable values. When you need both, use multi-fields and name your keyword sub-field .raw so future-you remembers why it’s there.
GET /your-index/_mapping and look for string fields that were auto-mapped. For each one, ask whether you actually need the text sub-field, the keyword sub-field, or both — then define an explicit mapping and reindex. Your cluster’s memory will thank you.
