External Storage - Tensormesh User Documentation

External Storage gives your serverless models a persistent KV cache bucket — so context is remembered across requests and sessions, not just within a single call. Navigate to Operations → Storage to view plans and subscribe.

Why It Matters

By default, Tensormesh’s KV cache is in-memory and scoped to a single request. Every new session starts cold: tokens that were cached in a previous session need to be recomputed and billed as regular input tokens. With an External Storage bucket, those tokens are persisted and reused across sessions.

Persistent Cache

KV cache entries survive beyond a single request window. Long system prompts and repeated context are stored across sessions.

More $0 Cached Tokens

A higher fraction of requests hit the cache, meaning more tokens served at $0 and a lower effective cost per call.

Faster Responses

Requests that share context with previous sessions skip recomputation entirely — reducing time-to-first-token.

No API Changes

Once your bucket is active, caching is handled automatically. No changes to your existing API calls required.

Storage Plans

Plans are tiered by bucket size. Subscribe or change plans anytime from Operations → Storage — billing adjusts immediately and no data is lost on upgrade.

Plan	Best For
Bronze	Getting started — low to moderate request volume
Silver	Agentic developers — more headroom for parallel workloads
Gold	Production-scale inference — high volume and large system prompts

External Storage is a flat monthly subscription billed separately from token usage. See Pricing Overview for how it interacts with cached token pricing.

KV Cache Calculator

Use the KV Cache Calculator on Operations → Storage to estimate how much bucket capacity a given conversation will consume. Select a model, enter a context length, and choose a data type — the calculator shows the resulting KV cache size in GB, broken down step-by-step using the model’s actual architecture parameters. How to use it:

Navigate to Operations → Storage.
Scroll to the KV Cache Calculator section.
Pick a model from the serverless catalog.
Enter the context length you plan to work with.
Select the data type (BF16 is the default for most models).
Read off the GB estimate and compare it to each plan’s bucket size.

Monitoring Your Usage

The Operations → Storage page shows:

Live usage bar — Your current bucket fill level so you know how much capacity you’re using.
Per-model KV cache usage table — A breakdown of how much storage each model is consuming, so you can see exactly where your bucket is being used.

Storage threshold notifications — When your bucket approaches capacity, we send a notification directing you to Operations → Storage. Use that as a prompt to upgrade your plan or review which models are consuming the most space.

What Gets Cached

External Storage extends the same KV cache that already powers $0 cached tokens — it just makes those cached entries persist beyond a single request:

System messages and instructions
Shared conversation prefixes and history
Long document contexts passed repeatedly
Common prompt templates shared across sessions

Cross-Session vs In-Request Caching

	In-Memory Cache (default)	External Storage
Scope	Single request window	Across requests and sessions
Cost	Free (included)	Flat monthly subscription
Setup	None	Subscribe at Operations → Storage
Cache hit rate	Lower (cold on every new session)	Higher (warm on returning sessions)

Even without External Storage, cached tokens are always $0. External Storage increases the fraction of requests that hit the cache.

​Why It Matters

Persistent Cache

More $0 Cached Tokens

Faster Responses

No API Changes

​Storage Plans

​KV Cache Calculator

​Monitoring Your Usage

​What Gets Cached

​Cross-Session vs In-Request Caching

​Related

Why It Matters

Storage Plans

KV Cache Calculator

Monitoring Your Usage

What Gets Cached

Cross-Session vs In-Request Caching

Related