chat.completions, models, completions, responses, tokenize, detokenize, health, and version on both surfaces.
1. Install The Package
Prerequisite: Python3.12 or newer.
For a published release:
2. Pick A Surface
- Serverless inference: use
client.inference.serverless - On-Demand inference: use
client.inference.on_demand - Control Plane: use
client.control_plane
models, health, and version also work without one. Control Plane uses a bearer token.
For a first successful SDK request, start with serverless inference. Use on-demand only if you already have the deployment routing values for your environment.
Model naming depends on the inference surface:
- serverless examples expect a serverless model name
- on-demand examples expect the served gateway model name, not the Control Plane
modelIdUUID gateway_model_idremains a config compatibility key used by the CLI flow; its value is the served gateway model name string you send asmodel
gateway_api_key is the stored inference API key used by the SDK as inference_api_key.
3. Get Credentials
- For a Control Plane bearer token, use the browser login flow in CLI Authentication, then use
tm auth print-token --yes-i-knowonly in a controlled shell when you need to pass that token into SDK code. - For an inference API key, either use the key your Tensormesh environment already issued to you, or create one through the authenticated Control Plane flow:
4. Choose A Model Name
- For serverless, pass a serverless model name that is valid for the selected serverless host.
- If you have Control Plane access for the same Tensormesh environment, discover published serverless models with
tm billing pricing serverless list. - Use the returned
pricing[].modelvalue as themodelargument. - If you only have inference credentials, or you are targeting a different serverless host override, ask your operator or admin for the exact serverless
modelstring for that host before sending the request. - For on-demand, pass the served gateway model name, not the Control Plane
modelIdUUID. - If you use the local operator flow, run
tm init --syncand inspectgateway_model_idintm --output json config show. The storedgateway_model_idvalue is the served gateway model name string to pass asmodel.
tm billing pricing serverless list for the same Tensormesh environment, or asking your operator or admin for the exact serverless model string.
5. First Sync Request
6. First Async Request
7. First Control-Plane Request
Next Steps
- If you are deciding which credentials and base URLs to use, continue with Auth And Config.
- If you want chat completions plus the other verified serverless and on-demand endpoints, continue with Inference.
- If you are migrating an existing OpenAI or Fireworks chat integration, continue with Migration From OpenAI And Fireworks.
- If you want models, billing, users, or support examples, continue with Control Plane.
- If you want the CLI operator path for Control Plane tasks, continue with Control Plane Workflows.

