Skip to main content
The current Tensormesh SDK supports the narrowest migration path on the inference side:
  • serverless chat completions
  • serverless Responses client
  • on-demand chat completions plus routed models, completions, responses, tokenize, detokenize, health, and version
  • no embeddings client

Fastest Serverless Migration

If your existing app already uses chat completions, serverless is the closest fit.
from tensormesh import Tensormesh
from tensormesh.types import ChatMessage

client = Tensormesh(inference_api_key="YOUR_INFERENCE_API_KEY")

serverless_model_name = "YOUR_SERVERLESS_MODEL_NAME"
completion = client.inference.serverless.chat.completions.create(
    model=serverless_model_name,
    messages=[ChatMessage(role="user", content="Say hello.")],
)

print(completion.choices[0].message.content)

What Changes From OpenAI Or Fireworks

  • Serverless uses client.inference.serverless.chat.completions.create(...), not client.chat.completions.create(...).
  • Serverless also exposes client.inference.serverless.responses.create(...) when you want the verified responses surface.
  • On-demand is not drop-in compatible. You must configure both on_demand_base_url and on_demand_user_id, and you must send the served gateway model name instead of the Control Plane modelId UUID.
  • This SDK does not currently expose embeddings.
  • Message content is text-oriented in this SDK surface; multimodal content-part request shapes are not modeled here.
  • Structured output is limited to response_format={"type": "json_object"} or ResponseFormat(type="json_object"). JSON Schema-style json_schema response formats are not supported on this surface.
  • CLI login state is not read automatically by the Python SDK. Application code must pass credentials explicitly or via the documented SDK environment variables.

On-Demand Differences

Use on-demand only when you already have the Tensormesh routing inputs for that deployment.
from tensormesh import Tensormesh
from tensormesh.types import ChatMessage

client = Tensormesh(
    inference_api_key="YOUR_INFERENCE_API_KEY",
    on_demand_base_url="https://YOUR_ON_DEMAND_BASE_URL",
    on_demand_user_id="00000000-0000-0000-0000-000000000000",
)

served_gateway_model_name = "YOUR_SERVED_GATEWAY_MODEL_NAME"
completion = client.inference.on_demand.chat.completions.create(
    model=served_gateway_model_name,
    messages=[ChatMessage(role="user", content="Say hello.")],
)

print(completion.choices[0].message.content)
  • Choose serverless when you want the closest OpenAI-style flow on this SDK surface.
  • Choose on-demand only when you need Tensormesh-specific routing and already know the served gateway model name for that deployment.
  • If you do not already know a valid serverless model name, start with Choose A Serverless Model Name before copying the serverless example.