What You’ll See
The Observability dashboard shows one chart per metric, with a separate series for each model you have serverless usage on. Use the model filter to narrow down to specific models, and the time range selector to zoom in or out.| Metric | What it shows |
|---|---|
| Cache Hit Rate | Fraction of tokens served from the LMCache (L1 CPU cache or L2 External Storage) at $0.00 |
| Cache Hit Tokens | Total token count served from cache in each interval |
| CPU Read Tokens | Tokens loaded from CPU memory into the inference engine |
| CPU Write Tokens | Tokens written to CPU memory by the inference engine |
| CPU Evict Tokens | Tokens evicted from CPU memory to make room for new entries |
| Storage to CPU Tokens | Tokens promoted from External Storage (L2) into CPU memory (L1) |
| Storage Write Tokens | Tokens written from CPU memory out to External Storage |
| Storage Evict Tokens | Tokens evicted from External Storage |
Chunk-based metrics (CPU reads, writes, evicts; Storage reads, writes, evicts) are displayed in tokens — each internal LMCache chunk equals 256 tokens.
Exporting Data
Click Export CSV to download the currently visible metrics for all selected models and the active time range. The export includes:- Model name
- Cache Hit Rate (avg, min, max)
- Cache Hit Tokens (total)
- CPU Read / Write / Evict Tokens (total)
- Storage to CPU / Storage Write / Storage Evict Tokens (total)
Understanding Cache Hit Rate
Cache Hit Rate is the most important signal for cost efficiency:- High hit rate — a large fraction of your input tokens are being served from cache at $0.00, meaning your prompts share consistent prefixes across requests.
- Low hit rate — most tokens are being recomputed on every request. Common causes: variable system prompts, dynamic content inserted early in the prompt, or short sessions that don’t benefit from cross-session caching.
L2 (External Storage) Metrics
The Storage to CPU, Storage Write, and Storage Evict metrics only appear if you have an External Storage subscription active. They show how your L2 bucket is being used:- Storage to CPU — tokens being promoted from your persistent bucket into active CPU cache. High values here mean your cross-session cache is being hit.
- Storage Write — tokens being persisted to your bucket for future sessions.
- Storage Evict — tokens being dropped from your bucket (bucket full or TTL expired).

