Inference API Reference - Tensormesh User Documentation

Use this section when you are calling Tensormesh inference directly over HTTP instead of through the Python SDK or the tm CLI. Direct inference callers should expect 429 rate limits on busy surfaces, honor Retry-After when present, and avoid automatic retries around non-idempotent writes unless duplicate effects are acceptable.

Surface

Serverless: OpenAI-compatible chat completions plus verified models, completions, responses, tokenize, detokenize, health, and version endpoints on the public serverless host. Auth: Authorization: Bearer <API_KEY> for POST routes; GET /v1/models, /health, and /version also work on the public host without auth.

Start Here

Serverless Chat Completions
Serverless Models
Serverless Responses
API Quickstart
Choose A Serverless Model Name

If you need management APIs for users, models, billing, support, logs, or metrics, use the Control Plane API tab instead of this section.

​Surface

​Start Here

Surface

Start Here