Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

Navigate to Operations → Serverless Usage to monitor your token consumption across serverless inference calls.

What You’ll See

MetricWhat it shows
Input TokensTotal tokens sent in requests (system prompt + messages)
Output TokensTotal tokens generated by models
Cached TokensInput tokens served from the KV cache at $0.00
Cache Hit RatePercentage of input tokens served from cache
A chart tracks your cache hit performance over time so you can see whether prompt changes are improving cache efficiency.

Cache Hit Rate

  • Shows what percentage of input tokens were served from cache at $0.00
  • Higher = your prompts share consistent prefixes, no repeated compute costs
  • If it’s low: the most common cause is a variable or inconsistent system prompt, or dynamic content appearing too early in the prompt
See Pricing Overview for strategies to improve cache hit rate.

Per-Model Cost

The usage table breaks down by model:
  • Token counts (input, output, cached) per model
  • Cost per model
  • Useful for spotting which models drive most of your spend
  • A low cache hit rate on a specific model often signals inconsistent prompts in that flow

Token Pricing

Token TypeCost
Input TokenPer-model rate (see Deploy → Serverless)
Output TokenPer-model rate
Cached Token$0.00