Skip to main content
This is the shortest SDK-first path to a working request. The public inference surface exposes chat.completions, models, completions, responses, tokenize, detokenize, health, and version on both surfaces.

1. Install The Package

Prerequisite: Python 3.12 or newer. For a published release:
pip install tensormesh

2. Pick A Surface

  • Serverless inference: use client.inference.serverless
  • On-Demand inference: use client.inference.on_demand
  • Control Plane: use client.control_plane
Inference usually uses an inference API key. On the default public serverless host, models, health, and version also work without one. Control Plane uses a bearer token. For a first successful SDK request, start with serverless inference. Use on-demand only if you already have the deployment routing values for your environment. Model naming depends on the inference surface:
  • serverless examples expect a serverless model name
  • on-demand examples expect the served gateway model name, not the Control Plane modelId UUID
  • gateway_model_id remains a config compatibility key used by the CLI flow; its value is the served gateway model name string you send as model
If you are coming from the CLI-managed flow, gateway_api_key is the stored inference API key used by the SDK as inference_api_key.

3. Get Credentials

  • For a Control Plane bearer token, use the browser login flow in CLI Authentication, then use tm auth print-token --yes-i-know only in a controlled shell when you need to pass that token into SDK code.
  • For an inference API key, either use the key your Tensormesh environment already issued to you, or create one through the authenticated Control Plane flow:
tm auth login
USER_ID="$(tm --output json auth whoami | python3 -c 'import json,sys; print(json.load(sys.stdin)["user"]["id"])')"
tm users api-keys create --user-id "$USER_ID" --name sdk-key --yes
If your environment does not expose self-serve API key creation, ask your operator or admin for the exact inference API key to use.

4. Choose A Model Name

  • For serverless, pass a serverless model name that is valid for the selected serverless host.
  • If you have Control Plane access for the same Tensormesh environment, discover published serverless models with tm billing pricing serverless list.
  • Use the returned pricing[].model value as the model argument.
  • If you only have inference credentials, or you are targeting a different serverless host override, ask your operator or admin for the exact serverless model string for that host before sending the request.
  • For on-demand, pass the served gateway model name, not the Control Plane modelId UUID.
  • If you use the local operator flow, run tm init --sync and inspect gateway_model_id in tm --output json config show. The stored gateway_model_id value is the served gateway model name string to pass as model.
If you do not already have a valid serverless model name, resolve it before using the serverless examples below. The supported paths are tm billing pricing serverless list for the same Tensormesh environment, or asking your operator or admin for the exact serverless model string.

5. First Sync Request

from tensormesh import Tensormesh
from tensormesh.types import ChatMessage

with Tensormesh(
    inference_api_key="YOUR_INFERENCE_API_KEY",
) as client:
    serverless_model_name = "YOUR_SERVERLESS_MODEL_NAME"
    completion = client.inference.serverless.chat.completions.create(
        model=serverless_model_name,
        messages=[ChatMessage(role="user", content="Say hello.")],
    )

print(completion.choices[0].message.content)

6. First Async Request

import asyncio

from tensormesh import AsyncTensormesh
from tensormesh.types import ChatMessage


async def main() -> None:
    async with AsyncTensormesh(
        inference_api_key="YOUR_INFERENCE_API_KEY",
    ) as client:
        serverless_model_name = "YOUR_SERVERLESS_MODEL_NAME"
        completion = await client.inference.serverless.chat.completions.create(
            model=serverless_model_name,
            messages=[ChatMessage(role="user", content="Say hello.")],
        )
        print(completion.choices[0].message.content)


asyncio.run(main())

7. First Control-Plane Request

from tensormesh import Tensormesh

with Tensormesh(control_plane_token="YOUR_CONTROL_PLANE_TOKEN") as client:
    profile = client.control_plane.users.get_auth_profile()

print(profile.display_name)

Next Steps

  • If you are deciding which credentials and base URLs to use, continue with Auth And Config.
  • If you want chat completions plus the other verified serverless and on-demand endpoints, continue with Inference.
  • If you are migrating an existing OpenAI or Fireworks chat integration, continue with Migration From OpenAI And Fireworks.
  • If you want models, billing, users, or support examples, continue with Control Plane.
  • If you want the CLI operator path for Control Plane tasks, continue with Control Plane Workflows.
Use the Control Plane API tab in the docs navigation for generated Control Plane reference.