
How It Works
Browse Models
View all available serverless models in the model catalog. Each card shows the model name, family, context window, and per-token pricing.
Filter & Select
Use the search bar, capability filters (Coding, Reasoning, Agentic, Tool Use, Chat), or use case filters (Production APIs, Automation, Low-Latency, Research, etc.) to find the right model for your workload.
Review Details
Select a model to view its full specifications — parameter count, architecture, context window, capabilities, and detailed pricing breakdown.
Pricing
Serverless models use pay-per-token pricing, displayed as a rate per 1M tokens on each model card. You are charged for input and output tokens at per-model rates. Cached tokens are $0.00 — when Tensormesh serves a token from its KV cache, you are not charged for it. Current pricing for each model is displayed on the Deploy → Serverless page. For a full breakdown, see Pricing Overview.Choosing the Right Model
Reasoning & Agents
Look for large MoE models with long context windows. Higher parameter counts generally mean stronger reasoning and tool use for complex agentic workflows.
Coding Agents
Coding-specialized models with large context windows are best for multi-file tasks, code review, and agentic workflows. Smaller coding models offer a faster, cheaper alternative for simpler tasks.
Low-Latency & High-Throughput
Smaller, compact models are the fastest and most cost-effective — ideal for real-time assistants, chatbots, and high-volume API services where speed matters more than raw capability.
Long-Context & Document Processing
For workflows that process large documents, codebases, or long conversation histories, prioritize models with the largest context windows available in the catalog.
Quick Start
- cURL
- Python
- SDK
The serverless API is OpenAI-compatible. You can use any OpenAI SDK or client library by pointing the base URL to
https://serverless.tensormesh.ai and using your Tensormesh API key.

