> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Welcome to Tensormesh Inference

> The AI inference platform for application and agent developers.

Tensormesh helps developers build faster, lower-cost AI applications by caching and reusing repeated context across requests. For agents, RAG systems, copilots, and multi-turn assistants, the same instructions, documents, tools, and conversation history often get processed again and again. Tensormesh uses KV caching to reuse that work automatically, reducing redundant computation and improving response latency.

***

## What You Can Do

<CardGroup cols={3}>
  <Card title="Reduce API Costs" icon="circle-dollar-to-slot">
    Avoid paying to process the same context again and again. Tensormesh reuses cached context across repeated requests — cached tokens are always \$0.
  </Card>

  <Card title="Speed Up Agent Responses" icon="bolt">
    Skip redundant prefill work so agents, copilots, and RAG apps can respond faster. Cached prefixes return instantly with no recomputation.
  </Card>

  <Card title="Build With Less Infrastructure" icon="cloud">
    Start with serverless APIs for instant access with no setup. Move to reserved GPU clusters when you need dedicated capacity or compliance requirements.
  </Card>
</CardGroup>

***

## Core Technology: KV Caching

Tensormesh is built on KV caching. Instead of recomputing repeated prompts, documents, tools, and conversation history, Tensormesh stores reusable model state and serves it again when similar context appears. That means less GPU waste, faster responses, and more efficient AI applications at scale.

|                         | What happens                                            | What you pay              |
| ----------------------- | ------------------------------------------------------- | ------------------------- |
| **First request**       | Input tokens are computed and stored in the KV cache    | Standard input token rate |
| **Subsequent requests** | Matching tokens are served from cache, no recomputation | \$0.00 per cached token   |

The longer and more consistent your shared context, the lower your effective cost. A 2,000-token system prompt reused across 10,000 daily requests saves the cost of 20 million input tokens.

***

## Coding Agents

Tensormesh powers some of the most capable open-weight coding agents available. Use any serverless model as the backend for Claude Code or Codex CLI:

<CardGroup cols={2}>
  <Card title="Claude Code" icon="terminal" href="/claude-code">
    Run Claude Code backed by Tensormesh serverless inference. Point it at any supported open-weight model and start building — no changes to your existing Claude Code workflow.
  </Card>

  <Card title="Codex CLI" icon="code" href="/codex">
    Drive open-weight models with OpenAI Codex CLI using Tensormesh as the inference provider. OpenAI-compatible API means zero configuration changes.
  </Card>
</CardGroup>

***

## Explore the Docs

<CardGroup cols={3}>
  <Card title="Serverless Inference" icon="cloud" href="/serverless-inference">
    Model catalog, per-token pricing, and quickstart examples.
  </Card>

  <Card title="Reserved Deployments" icon="server" href="/reserved-deployments">
    When to use dedicated capacity and how to request a cluster.
  </Card>

  <Card title="External Storage" icon="database" href="/external-storage">
    Plan tiers, usage monitoring, and how persistent caching affects your costs.
  </Card>

  <Card title="Pricing Overview" icon="dollar-sign" href="/pricing-overview">
    How token pricing works and how to structure prompts to maximize cache hits.
  </Card>

  <Card title="Serverless Usage" icon="chart-bar" href="/serverless-usage">
    Per-model token counts, cache hit rates, and cost breakdown.
  </Card>

  <Card title="Cache Savings" icon="piggy-bank" href="/cache-savings">
    Dollar value of your KV cache hits over time.
  </Card>
</CardGroup>
