Skip to main content
Serverless inference lets you call any supported model through a simple API without provisioning or managing GPU infrastructure. Navigate to Create → Serverless from the sidebar to get started. Serverless Models

How It Works

1

Browse Models

View all available serverless models in the model catalog. Each card shows the model name, family, context window, and per-token pricing.
2

Filter & Select

Use the search bar, capability filters (Coding, Reasoning, Agentic, Tool Use, Chat), or use case filters (Production APIs, Automation, Low-Latency, Research, etc.) to find the right model for your workload.
3

Review Details

Select a model to view its full specifications — parameter count, architecture, context window, capabilities, and detailed pricing breakdown.
4

Call the API

Use the provided code examples (cURL, Python, or CLI) with your API key to start sending requests immediately. No deployment step required.

Pricing

Serverless models use pay-per-token pricing. You are charged based on the number of input and output tokens processed. Cached input tokens are currently at zero cost. Current pricing for each model is displayed on the Serverless page. For a full breakdown of token usage, see Pricing Overview.
Track your serverless token usage and costs at any time under Management → Billing → Serverless Usage. See Billing & Cost Savings for details on the per-model usage breakdown.

Available Models

ModelParametersArchitectureContextCapabilities
Qwen3-Coder-480B-A35B480BMoE · 35B active262KCoding, Agentic, Tool Use
Qwen3-Coder-30B-A3B30.5BMoE · 3.3B active262KCoding, Agentic, Tool Use
Qwen3-235B-A22B235BMoE · 22B active131KReasoning, Coding, Chat, Tool Use
MiniMax-M2.5228BMoE196KCoding, Agentic, Reasoning, Tool Use
Devstral-2-123B123BDense256KCoding, Agentic, Tool Use
gpt-oss-120b116BMoE131KReasoning, Coding, Agentic, Tool Use
Qwen3-30B-A3B30.5BMoE · 3.3B active131KReasoning, Chat, Coding, Tool Use
gpt-oss-20b20BMoE131KReasoning, Coding, Agentic, Tool Use

Choosing the Right Model

Coding Agents & SWE

Qwen3-Coder-480B for maximum capability, Qwen3-Coder-30B for a faster and cheaper option, or Devstral-2-123B for multi-file editing across large repos.

General Reasoning & Chat

Qwen3-235B for frontier-level reasoning, or Qwen3-30B for a balanced, cost-effective general-purpose option.

Low-Latency & High-Throughput

gpt-oss-20b is the most compact and fastest model — ideal for real-time assistants and high-volume API services.

Long-Context & Automation

MiniMax-M2.5 (196K context) or Qwen3-Coder-480B (262K context) for workflows that need to process large amounts of information in a single request.

Quick Start

Get your API key from Profile → API Key in the dashboard. Then call the endpoint:
curl --request POST \
  --url https://serverless.tensormesh.ai/v1/chat/completions \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "MiniMaxAI/MiniMax-M2.5",
  "max_tokens": 16384,
  "temperature": 0.6,
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "top_p": 1,
  "top_k": 40,
  "presence_penalty": 0,
  "frequency_penalty": 0
}
'
The serverless API is OpenAI-compatible. You can use any OpenAI SDK or client library by pointing the base URL to https://serverless.tensormesh.ai and using your Tensormesh API key.