Navigate to Management → Cache Savings to see a dollar breakdown of what Tensormesh’s KV caching is saving you over time.Documentation Index
Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt
Use this file to discover all available pages before exploring further.
Understanding Your Savings
Cache savings = money you didn’t spend because tokens were served from the KV cache instead of being recomputed.- Every request that reuses a cached prefix (system prompt, shared document, conversation history) costs $0.00 for those tokens
- The savings figure shows what those cached tokens would have cost at the standard input rate
- A growing savings number means your prompts are well-structured and your cache is being used effectively
- Low savings relative to input spend usually means requests aren’t sharing consistent prefixes
How Savings Are Calculated
Estimated cache savings = cached token count × standard input rate for that model Example: 1,000 requests each reusing a 2,000-token system prompt → 2,000,000 tokens saved. The higher your request volume and the more consistent your prompts, the faster savings compound.Maximizing Your Savings
Use consistent system messages
Use consistent system messages
Keep your system prompt identical across requests. Even a single character change creates a cache miss.
Put stable content first
Put stable content first
Structure prompts so that static content (system prompt, shared context) comes before variable content (the user’s latest message). The cache matches from the start of the prompt.
Subscribe to External Storage
Subscribe to External Storage
External Storage persists your KV cache across sessions — dramatically increasing the fraction of requests that hit the cache for returning workloads.
Monitor hit rate by model
Monitor hit rate by model
Check for per-model cache hit rates. Low hit rates on specific models often signal inconsistent prefix structure in those request flows.

