Observability - Tensormesh User Documentation

Navigate to Operations → Observability to monitor real-time LMCache metrics across your serverless models.

What You’ll See

The Observability dashboard shows one chart per metric, with a separate series for each model you have serverless usage on. Use the model filter to narrow down to specific models, and the time range selector to zoom in or out.

Metric	What it shows
Cache Hit Rate	Fraction of tokens served from the LMCache (L1 CPU cache or L2 External Storage) at $0.00
Cache Hit Tokens	Total token count served from cache in each interval
CPU Read Tokens	Tokens loaded from CPU memory into the inference engine
CPU Write Tokens	Tokens written to CPU memory by the inference engine
CPU Evict Tokens	Tokens evicted from CPU memory to make room for new entries
Storage to CPU Tokens	Tokens promoted from External Storage (L2) into CPU memory (L1)
Storage Write Tokens	Tokens written from CPU memory out to External Storage
Storage Evict Tokens	Tokens evicted from External Storage

Chunk-based metrics (CPU reads, writes, evicts; Storage reads, writes, evicts) are displayed in tokens — each internal LMCache chunk equals 256 tokens.

Exporting Data

Click Export CSV to download the currently visible metrics for all selected models and the active time range. The export includes:

Model name
Cache Hit Rate (avg, min, max)
Cache Hit Tokens (total)
CPU Read / Write / Evict Tokens (total)
Storage to CPU / Storage Write / Storage Evict Tokens (total)

Understanding Cache Hit Rate

Cache Hit Rate is the most important signal for cost efficiency:

High hit rate — a large fraction of your input tokens are being served from cache at $0.00, meaning your prompts share consistent prefixes across requests.
Low hit rate — most tokens are being recomputed on every request. Common causes: variable system prompts, dynamic content inserted early in the prompt, or short sessions that don’t benefit from cross-session caching.

See Cache Savings for strategies to improve hit rate.

L2 (External Storage) Metrics

The Storage to CPU, Storage Write, and Storage Evict metrics only appear if you have an External Storage subscription active. They show how your L2 bucket is being used:

Storage to CPU — tokens being promoted from your persistent bucket into active CPU cache. High values here mean your cross-session cache is being hit.
Storage Write — tokens being persisted to your bucket for future sessions.
Storage Evict — tokens being dropped from your bucket (bucket full or TTL expired).

If Storage Evict is consistently high relative to Storage Write, your bucket may be undersized for your workload. Consider upgrading your External Storage plan.

Serverless Usage

Cache Savings

⌘I

​What You’ll See

​Exporting Data

​Understanding Cache Hit Rate

​L2 (External Storage) Metrics

​Related

What You’ll See

Exporting Data

Understanding Cache Hit Rate

L2 (External Storage) Metrics

Related