- serverless chat completions
- serverless Responses client
- on-demand chat completions plus routed models, completions, responses, tokenize, detokenize, health, and version
- no embeddings client
Fastest Serverless Migration
If your existing app already uses chat completions, serverless is the closest fit.What Changes From OpenAI Or Fireworks
- Serverless uses
client.inference.serverless.chat.completions.create(...), notclient.chat.completions.create(...). - Serverless also exposes
client.inference.serverless.responses.create(...)when you want the verified responses surface. - On-demand is not drop-in compatible. You must configure both
on_demand_base_urlandon_demand_user_id, and you must send the served gateway model name instead of the Control PlanemodelIdUUID. - This SDK does not currently expose embeddings.
- Message content is text-oriented in this SDK surface; multimodal content-part request shapes are not modeled here.
- Structured output is limited to
response_format={"type": "json_object"}orResponseFormat(type="json_object"). JSON Schema-stylejson_schemaresponse formats are not supported on this surface. - CLI login state is not read automatically by the Python SDK. Application code must pass credentials explicitly or via the documented SDK environment variables.
On-Demand Differences
Use on-demand only when you already have the Tensormesh routing inputs for that deployment.Recommended Decision
- Choose serverless when you want the closest OpenAI-style flow on this SDK surface.
- Choose on-demand only when you need Tensormesh-specific routing and already know the served gateway model name for that deployment.
- If you do not already know a valid serverless model name, start with Choose A Serverless Model Name before copying the serverless example.

