Serverless Usage - Tensormesh User Documentation

Navigate to Operations → Serverless Usage to monitor your token consumption across serverless inference calls.

What You’ll See

Metric	What it shows
Input Tokens	Total tokens sent in requests (system prompt + messages)
Output Tokens	Total tokens generated by models
Cached Tokens	Input tokens served from the KV cache at $0.00
GPU Cache Hit	Subset of cached tokens served from the vLLM prefix cache (L0 — GPU memory). Fastest possible cache hit.
CPU & Storage Hit	Subset of cached tokens served from LMCache (L1 CPU memory or L2 External Storage).
Cache Hit Rate	Percentage of input tokens served from cache

A chart tracks your cache hit performance over time so you can see whether prompt changes are improving cache efficiency.

Shows what percentage of input tokens were served from cache at $0.00
Higher = your prompts share consistent prefixes, no repeated compute costs
If it’s low: the most common cause is a variable or inconsistent system prompt, or dynamic content appearing too early in the prompt

See Pricing Overview for strategies to improve cache hit rate.

The usage table breaks down by model:

Token counts (input, output, cached) per model
Cost per model
Useful for spotting which models drive most of your spend
A low cache hit rate on a specific model often signals inconsistent prompts in that flow

Token Type	Cost
Input Token	Per-model rate (see Deploy → Serverless)
Output Token	Per-model rate
Cached Token	$0.00