tm CLI.
Direct inference callers should expect 429 rate limits on busy surfaces, honor Retry-After when present, and avoid automatic retries around non-idempotent writes unless duplicate effects are acceptable.
Choose A Surface
- Serverless: OpenAI-compatible chat completions plus verified
models,completions,responses,tokenize,detokenize,health, andversionendpoints on the public serverless host. - On-Demand: Tensormesh-hosted routed inference endpoints for a specific deployment. This path requires a provider-specific external host,
X-User-Id, and a served gateway model name.
Quick Comparison
| Surface | Best for | Auth | Host selection | Extra routing |
|---|---|---|---|---|
| Serverless | Fastest OpenAI-style request | Authorization: Bearer <API_KEY> for POST routes; GET /v1/models, /health, and /version also work on the public host without auth | Public serverless host | None |
| On-Demand | Requests against a specific deployed model | Authorization: Bearer <API_KEY> | Provider-specific external host for the deployment | X-User-Id: <uuid> |
Start Here
- Serverless Chat Completions
- Serverless Models
- Serverless Responses
- On-Demand Chat Completions
- On-Demand Models
- On-Demand Responses
- API Quickstart
- Choose A Serverless Model Name

