Navigate to Operations → Serverless Usage to monitor your token consumption across serverless inference calls.Documentation Index
Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt
Use this file to discover all available pages before exploring further.
What You’ll See
| Metric | What it shows |
|---|---|
| Input Tokens | Total tokens sent in requests (system prompt + messages) |
| Output Tokens | Total tokens generated by models |
| Cached Tokens | Input tokens served from the KV cache at $0.00 |
| Cache Hit Rate | Percentage of input tokens served from cache |
Cache Hit Rate
- Shows what percentage of input tokens were served from cache at $0.00
- Higher = your prompts share consistent prefixes, no repeated compute costs
- If it’s low: the most common cause is a variable or inconsistent system prompt, or dynamic content appearing too early in the prompt
Per-Model Cost
The usage table breaks down by model:- Token counts (input, output, cached) per model
- Cost per model
- Useful for spotting which models drive most of your spend
- A low cache hit rate on a specific model often signals inconsistent prompts in that flow
Token Pricing
| Token Type | Cost |
|---|---|
| Input Token | Per-model rate (see Deploy → Serverless) |
| Output Token | Per-model rate |
| Cached Token | $0.00 |

