Skip to main content
Navigate to Operations → Observability to monitor real-time LMCache metrics across your serverless models.

What You’ll See

The Observability dashboard shows one chart per metric, with a separate series for each model you have serverless usage on. Use the model filter to narrow down to specific models, and the time range selector to zoom in or out.
MetricWhat it shows
Cache Hit RateFraction of tokens served from the LMCache (L1 CPU cache or L2 External Storage) at $0.00
Cache Hit TokensTotal token count served from cache in each interval
CPU Read TokensTokens loaded from CPU memory into the inference engine
CPU Write TokensTokens written to CPU memory by the inference engine
CPU Evict TokensTokens evicted from CPU memory to make room for new entries
Storage to CPU TokensTokens promoted from External Storage (L2) into CPU memory (L1)
Storage Write TokensTokens written from CPU memory out to External Storage
Storage Evict TokensTokens evicted from External Storage
Chunk-based metrics (CPU reads, writes, evicts; Storage reads, writes, evicts) are displayed in tokens — each internal LMCache chunk equals 256 tokens.

Exporting Data

Click Export CSV to download the currently visible metrics for all selected models and the active time range. The export includes:
  • Model name
  • Cache Hit Rate (avg, min, max)
  • Cache Hit Tokens (total)
  • CPU Read / Write / Evict Tokens (total)
  • Storage to CPU / Storage Write / Storage Evict Tokens (total)

Understanding Cache Hit Rate

Cache Hit Rate is the most important signal for cost efficiency:
  • High hit rate — a large fraction of your input tokens are being served from cache at $0.00, meaning your prompts share consistent prefixes across requests.
  • Low hit rate — most tokens are being recomputed on every request. Common causes: variable system prompts, dynamic content inserted early in the prompt, or short sessions that don’t benefit from cross-session caching.
See Cache Savings for strategies to improve hit rate.

L2 (External Storage) Metrics

The Storage to CPU, Storage Write, and Storage Evict metrics only appear if you have an External Storage subscription active. They show how your L2 bucket is being used:
  • Storage to CPU — tokens being promoted from your persistent bucket into active CPU cache. High values here mean your cross-session cache is being hit.
  • Storage Write — tokens being persisted to your bucket for future sessions.
  • Storage Evict — tokens being dropped from your bucket (bucket full or TTL expired).
If Storage Evict is consistently high relative to Storage Write, your bucket may be undersized for your workload. Consider upgrading your External Storage plan.