Prompt Caching
Prompt caching reuses the static prefix of a request — the system prompt, tool definitions, and prior conversation turns — so repeated tokens are billed at a steep discount instead of full price on every turn. For multi-turn conversations and agents with large system prompts, this is a major cost and latency win.
Atlas turns caching on by default and reports the savings on every response.
How each provider caches
| Provider | Mechanism | What Atlas does |
|---|---|---|
| Anthropic | Explicit cache_control breakpoints | Atlas marks the system prompt and the end of the message history so the prefix is cached |
| OpenAI | Automatic (prompts ≥ 1024 tokens) | Nothing required — the provider caches server-side |
| xAI | Automatic | Nothing required |
| Automatic implicit caching (Gemini 2.5+) | Nothing required |
You can check whether a provider supports caching:
Atlas::provider('anthropic')->capabilities()->supports('caching'); // trueConfiguration
Caching is on by default. Control it globally in config/atlas.php:
'prompt_cache' => (bool) env('ATLAS_PROMPT_CACHE', true),Note: OpenAI, xAI, and Google cache automatically regardless of this setting — it gates the explicit
cache_controlbreakpoints Atlas adds for Anthropic. Either way, cache usage is always reported.
Override it per call:
// Disable caching for a single call
Atlas::agent('support')->cache(false)->message('Hello')->asText();
// Explicitly enable (e.g. when the global default is off)
Atlas::text('anthropic', 'claude-sonnet-4-5')->cache()->message('Hello')->asText();->cache() is available on both Atlas::agent() and Atlas::text() builders.
Reading cache usage
Every response's usage reports cache activity, regardless of provider:
$response = Atlas::agent('support')->forConversation($id)->message('Hi')->asText();
$response->usage->cachedTokens; // tokens served from cache (read) — the discount
$response->usage->cacheWriteTokens; // tokens written to cache (Anthropic; first turn)On the first call a stable prefix is written to the cache (cacheWriteTokens); on subsequent calls within the cache window the same prefix is read back cheaply (cachedTokens). With persistence enabled, these values are stored on the execution's usage for cost tracking.
Notes
- Caching only helps once the cacheable prefix crosses the provider's minimum (around 1024 tokens). Below that, providers ignore it — there is no penalty.
- Anthropic's cache entries are ephemeral (~5 minute TTL). Keep turns flowing to benefit from reads.
- Caching composes with conversation history and media replay: the replayed prefix is exactly what gets cached.