chat.completions, models, completions, responses, tokenize, detokenize, health, and version on both surfaces.
The SDK resolves configuration in this order:
- constructor arguments
- environment variables
- built-in defaults
Surface Boundaries
- Control Plane
- auth: bearer token
- client namespace:
client.control_plane
- Serverless inference
- auth: inference API key for POST routes; the default public host also serves
models,health, andversionwithout one - client namespace:
client.inference.serverless - extra namespaces:
models,completions,responses,tokenize,detokenize,health, andversion - model value: serverless model name
- auth: inference API key for POST routes; the default public host also serves
- On-Demand inference
- auth: inference API key
- extra header:
X-User-Id - client namespace:
client.inference.on_demand - extra namespaces:
models,completions,responses,tokenize,detokenize,health, andversion - model value: served gateway model name, not the Control Plane
modelIdUUID - config naming note: the CLI stores this served model name under
gateway_model_id, which remains a compatibility key; that string is the value you pass asmodel
Get Credentials
- For a Control Plane bearer token, use CLI Authentication. The CLI browser flow stores the token locally, and
tm auth print-token --yes-i-knowcan print it for a controlled SDK setup when you need it outside the CLI. - For an inference API key, either use the key your Tensormesh environment already issued to you, or create one through the authenticated workflow:
- If you only have inference credentials, you can still use the serverless SDK surface without Control Plane login.
gateway_api_key is the stored inference API key used by the SDK as inference_api_key. gateway_model_id remains a config compatibility key, and its value is the served model name string you pass as model.
Environment Variables
The SDK supports these environment variables:TENSORMESH_CONTROL_PLANE_TOKENTENSORMESH_CONTROL_PLANE_BASE_URLTENSORMESH_INFERENCE_API_KEYTENSORMESH_SERVERLESS_BASE_URLTENSORMESH_ON_DEMAND_BASE_URLTENSORMESH_ON_DEMAND_USER_IDTENSORMESH_TIMEOUT_SECONDSTENSORMESH_MAX_RETRIESTENSORMESH_CA_BUNDLE
Constructor-Based Configuration
max_retries applies to idempotent HTTP methods. The main inference calls on this SDK surface are POST requests such as /v1/chat/completions, /v1/completions, /v1/responses, /tokenize, and /detokenize, so those requests are not retried automatically.
Environment-Based Configuration
When To Use CLI Login
The SDK does not requiretm auth login.
- Production deployments and CI environments: supply credentials through environment variables (
TENSORMESH_INFERENCE_API_KEY,TENSORMESH_CONTROL_PLANE_TOKEN, etc.). No browser interaction is required. - Local development: use
tm auth loginfor the browser-based Control Plane auth flow when you want the CLI to store and manage the token locally.
external.nebius.tensormesh.ai are provider-specific examples, not universal defaults.
Common Mistakes
- trying to use a control-plane bearer token for inference
- forgetting that on-demand inference requires both
on_demand_base_urlandon_demand_user_id - using a Control Plane
modelIdUUID where the gateway expects a served model name - assuming the SDK reads
~/.config/tensormesh/automatically - mixing serverless and on-demand base URLs

