> ## Documentation Index > Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt > Use this file to discover all available pages before exploring further. # Welcome to Tensormesh Inference > The AI inference platform for application and agent developers. Tensormesh helps developers build faster, lower-cost AI applications by caching and reusing repeated context across requests. For agents, RAG systems, copilots, and multi-turn assistants, the same instructions, documents, tools, and conversation history often get processed again and again. Tensormesh uses KV caching to reuse that work automatically, reducing redundant computation and improving response latency. *** ## What You Can Do Avoid paying to process the same context again and again. Tensormesh reuses cached context across repeated requests — cached tokens are always \$0. Skip redundant prefill work so agents, copilots, and RAG apps can respond faster. Cached prefixes return instantly with no recomputation. Start with serverless APIs for instant access with no setup. Move to reserved GPU clusters when you need dedicated capacity or compliance requirements. *** ## Core Technology: KV Caching Tensormesh is built on KV caching. Instead of recomputing repeated prompts, documents, tools, and conversation history, Tensormesh stores reusable model state and serves it again when similar context appears. That means less GPU waste, faster responses, and more efficient AI applications at scale. | | What happens | What you pay | | ----------------------- | ------------------------------------------------------- | ------------------------- | | **First request** | Input tokens are computed and stored in the KV cache | Standard input token rate | | **Subsequent requests** | Matching tokens are served from cache, no recomputation | \$0.00 per cached token | The longer and more consistent your shared context, the lower your effective cost. A 2,000-token system prompt reused across 10,000 daily requests saves the cost of 20 million input tokens. *** ## Coding Agents Tensormesh powers some of the most capable open-weight coding agents available. Use any serverless model as the backend for Claude Code or Codex CLI: Run Claude Code backed by Tensormesh serverless inference. Point it at any supported open-weight model and start building — no changes to your existing Claude Code workflow. Drive open-weight models with OpenAI Codex CLI using Tensormesh as the inference provider. OpenAI-compatible API means zero configuration changes. *** ## Explore the Docs Model catalog, per-token pricing, and quickstart examples. When to use dedicated capacity and how to request a cluster. Plan tiers, usage monitoring, and how persistent caching affects your costs. How token pricing works and how to structure prompts to maximize cache hits. Per-model token counts, cache hit rates, and cost breakdown. Dollar value of your KV cache hits over time.