Welcome to Tensormesh Inference - Tensormesh User Documentation

What You Can Do

Reduce API Costs

Avoid paying to process the same context again and again. Tensormesh reuses cached context across repeated requests — cached tokens are always $0.

Speed Up Agent Responses

Skip redundant prefill work so agents, copilots, and RAG apps can respond faster. Cached prefixes return instantly with no recomputation.

Build With Less Infrastructure

Start with serverless APIs for instant access with no setup. Move to reserved GPU clusters when you need dedicated capacity or compliance requirements.

Core Technology: KV Caching

Tensormesh is built on KV caching. Instead of recomputing repeated prompts, documents, tools, and conversation history, Tensormesh stores reusable model state and serves it again when similar context appears. That means less GPU waste, faster responses, and more efficient AI applications at scale.

	What happens	What you pay
First request	Input tokens are computed and stored in the KV cache	Standard input token rate
Subsequent requests	Matching tokens are served from cache, no recomputation	$0.00 per cached token

The longer and more consistent your shared context, the lower your effective cost. A 2,000-token system prompt reused across 10,000 daily requests saves the cost of 20 million input tokens.

Coding Agents

Tensormesh powers some of the most capable open-weight coding agents available. Use any serverless model as the backend for Claude Code or Codex CLI:

Claude Code

Run Claude Code backed by Tensormesh serverless inference. Point it at any supported open-weight model and start building — no changes to your existing Claude Code workflow.

Codex CLI

Drive open-weight models with OpenAI Codex CLI using Tensormesh as the inference provider. OpenAI-compatible API means zero configuration changes.

Explore the Docs

Serverless Inference

Model catalog, per-token pricing, and quickstart examples.

Reserved Deployments

When to use dedicated capacity and how to request a cluster.

External Storage

Plan tiers, usage monitoring, and how persistent caching affects your costs.

Pricing Overview

How token pricing works and how to structure prompts to maximize cache hits.

Serverless Usage

Per-model token counts, cache hit rates, and cost breakdown.

Cache Savings

Dollar value of your KV cache hits over time.

​What You Can Do