Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

Tensormesh helps developers build faster, lower-cost AI applications by caching and reusing repeated context across requests. For agents, RAG systems, copilots, and multi-turn assistants, the same instructions, documents, tools, and conversation history often get processed again and again. Tensormesh uses KV caching to reuse that work automatically, reducing redundant computation and improving response latency.

What You Can Do

Reduce API Costs

Avoid paying to process the same context again and again. Tensormesh reuses cached context across repeated requests — cached tokens are always $0.

Speed Up Agent Responses

Skip redundant prefill work so agents, copilots, and RAG apps can respond faster. Cached prefixes return instantly with no recomputation.

Build With Less Infrastructure

Start with serverless APIs for instant access with no setup. Move to reserved GPU clusters when you need dedicated capacity or compliance requirements.

Core Technology: KV Caching

Tensormesh is built on KV caching. Instead of recomputing repeated prompts, documents, tools, and conversation history, Tensormesh stores reusable model state and serves it again when similar context appears. That means less GPU waste, faster responses, and more efficient AI applications at scale.
What happensWhat you pay
First requestInput tokens are computed and stored in the KV cacheStandard input token rate
Subsequent requestsMatching tokens are served from cache, no recomputation$0.00 per cached token
The longer and more consistent your shared context, the lower your effective cost. A 2,000-token system prompt reused across 10,000 daily requests saves the cost of 20 million input tokens.

Coding Agents

Tensormesh powers some of the most capable open-weight coding agents available. Use any serverless model as the backend for Claude Code or Codex CLI:

Claude Code

Run Claude Code backed by Tensormesh serverless inference. Point it at any supported open-weight model and start building — no changes to your existing Claude Code workflow.

Codex CLI

Drive open-weight models with OpenAI Codex CLI using Tensormesh as the inference provider. OpenAI-compatible API means zero configuration changes.

Explore the Docs

Serverless Inference

Model catalog, per-token pricing, and quickstart examples.

Reserved Deployments

When to use dedicated capacity and how to request a cluster.

External Storage

Plan tiers, usage monitoring, and how persistent caching affects your costs.

Pricing Overview

How token pricing works and how to structure prompts to maximize cache hits.

Serverless Usage

Per-model token counts, cache hit rates, and cost breakdown.

Cache Savings

Dollar value of your KV cache hits over time.