Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

Navigate to Management → Cache Savings to see a dollar breakdown of what Tensormesh’s KV caching is saving you over time.

Understanding Your Savings

Cache savings = money you didn’t spend because tokens were served from the KV cache instead of being recomputed.
  • Every request that reuses a cached prefix (system prompt, shared document, conversation history) costs $0.00 for those tokens
  • The savings figure shows what those cached tokens would have cost at the standard input rate
  • A growing savings number means your prompts are well-structured and your cache is being used effectively
  • Low savings relative to input spend usually means requests aren’t sharing consistent prefixes

How Savings Are Calculated

Estimated cache savings = cached token count × standard input rate for that model Example: 1,000 requests each reusing a 2,000-token system prompt → 2,000,000 tokens saved. The higher your request volume and the more consistent your prompts, the faster savings compound.

Maximizing Your Savings

Keep your system prompt identical across requests. Even a single character change creates a cache miss.
Structure prompts so that static content (system prompt, shared context) comes before variable content (the user’s latest message). The cache matches from the start of the prompt.
External Storage persists your KV cache across sessions — dramatically increasing the fraction of requests that hit the cache for returning workloads.
Check for per-model cache hit rates. Low hit rates on specific models often signal inconsistent prefix structure in those request flows.