Cache Savings - Tensormesh User Documentation

Navigate to Management → Cache Savings to see a dollar breakdown of what Tensormesh’s KV caching is saving you over time.

Understanding Your Savings

Cache savings = money you didn’t spend because tokens were served from the KV cache instead of being recomputed.

Every request that reuses a cached prefix (system prompt, shared document, conversation history) costs $0.00 for those tokens
The savings figure shows what those cached tokens would have cost at the standard input rate
A growing savings number means your prompts are well-structured and your cache is being used effectively
Low savings relative to input spend usually means requests aren’t sharing consistent prefixes

How Savings Are Calculated

Estimated cache savings = cached token count × standard input rate for that model Example: 1,000 requests each reusing a 2,000-token system prompt → 2,000,000 tokens saved. The higher your request volume and the more consistent your prompts, the faster savings compound.

Maximizing Your Savings

Use consistent system messages

Keep your system prompt identical across requests. Even a single character change creates a cache miss.

Put stable content first

Structure prompts so that static content (system prompt, shared context) comes before variable content (the user’s latest message). The cache matches from the start of the prompt.

Monitor hit rate by model

Check for per-model cache hit rates. Low hit rates on specific models often signal inconsistent prefix structure in those request flows.

Observability

Billing & Usage

⌘I

​Understanding Your Savings

​How Savings Are Calculated

​Maximizing Your Savings

​Related

Understanding Your Savings

How Savings Are Calculated

Maximizing Your Savings

Related