Prompt Cache

When developing AI applications, you may find yourself reusing the same prompts across multiple requests. The Prompt Cache feature can help improve response times and reduce API call costs by caching your frequently used prompts.

Restrictions and Limitations

Cache Limits

Models have a minimum cache-eligible prompt length. Attempts to cache prompts shorter than this will result in an API error.
Cached prompts have a lifetime (TTL) of approximately 5 minutes. This duration cannot be modified due to provider limitations.

Supported Models

Prompt Cache is currently in beta and available only for select Claude models:

Model	Minimum Cache Length	Base Input Tokens	Cache Writes	Cache Hits	Output Tokens
GPT-4o	/	$2.50 / MTok	$0/ MTok	$1.25 / MTok	$10.00 / MTok
GPT-4o mini	/	$0.15 / MTok	$0/ MTok	$0.075 / MTok	$0.60 / MTok
O1	/	$15.00 / MTok	$0/ MTok	$7.5 / MTok	$60.00 / MTok
DeepSeek Coder	/	$0.14 / MTok	$0 / MTok	$0.02 / MTok	$0.28 / MTok
DeepSeek Chat	/	$0.14 / MTok	$0 / MTok	$0.02 / MTok	$0.28 / MTok
Claude 3.5 Sonnet	1024	$3 / MTok	$3.75 / MTok	$0.30 / MTok	$15 / MTok
Claude 3.5 Haiku	2048	$1 / MTok	$1.25 / MTok	$0.1 / MTok	$5 / MTok
Claude 3.0 Haiku	2048	$0.25 / MTok	$0.30 / MTok	$0.03 / MTok	$1.25 / MTok
Claude 3.0 Opus	1024	$15 / MTok	$18.75 / MTok	$1.50 / MTok	$75 / MTok

Implementation

For models like GPT-4o, O1, and DeepSeek, the cache is automatically enabled. You don't need to add any additional parameters to your request payload.

To cache your prompts, add a cache_control object to your request payload. Currently, the only supported cache type is ephemeral. Once a prompt is cached, subsequent identical requests will utilize the cached prompt, reducing response time and API call costs.

You can include the cache_control object in subsequent requests without refreshing the cache; it will directly use the cached prompt.

Example:

{
  "model": "claude-3-5-sonnet",
  "messages": [
    {
      "role": "user",
      "content": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
      "cache_control": { "type": "ephemeral" }
    }
  ]
}

Monitoring

You can track cache performance in the Analytics dashboard. The dashboard displays metrics for Cache Creation Input Tokens and Cache Read Input Tokens, allowing you to assess the effectiveness of your prompt caching strategy.