Skip to main content
This page covers the standard On-Demand first-request flow. On-Demand requests need a gateway API key and an X-User-Id header. The standard setup flow is tm auth login followed by tm init --sync, which derives and stores the gateway user id plus the rest of the managed gateway state under [managed] in the active config.toml state root. When TM_CONFIG_HOME is unset, that file is ~/.config/tensormesh/config.toml. Use tm auth whoami when you need a live Control Plane token check. If tm is not already on your PATH, install the CLI first with Installation. The commands below assume tm is on your PATH. If you are running from this repo checkout without activating a shell that already exposes tm, use ./.venv/bin/tm. If you want the shortest serverless request instead, use:
tm infer chat \
  --surface serverless \
  --api-key YOUR_INFERENCE_API_KEY \
  --model YOUR_SERVERLESS_MODEL_NAME \
  --json '[{"role":"user","content":"Say hello."}]'
If you do not already know a valid serverless model name, use tm billing pricing serverless list before using that shortcut. If you are not sure what is already configured locally, run tm init first. Use tm infer doctor when you specifically want to check whether the On-Demand path is ready for a direct request or for --model @latest.

Set Gateway Credentials

tm auth login
tm init --sync
tm init --sync syncs the available gateway settings from the Control Plane. When the command runs with --controlplane-base, it also persists that controlplane_base into the active config.toml so later @latest requests stay on the same environment. When a served deployment already exists, the sync includes the served gateway model name. Use that served gateway model name here, not the Control Plane modelId UUID. If no served model exists yet, tm init --sync can still sync the API key and user id, but you will need to deploy a model or pass an explicit served model name later before On-Demand chat can succeed.

Send A Chat Request

tm infer chat --json '[{"role":"user","content":"Say hello."}]'
For streaming output:
echo '[{"role":"user","content":"Stream tokens."}]' \
  | tm infer chat --stream
Streaming requests now use a bounded idle read timeout. If you expect long quiet gaps between SSE events, raise it explicitly:
echo '[{"role":"user","content":"Stream tokens."}]' \
  | tm infer chat --stream --stream-idle-timeout 600

Using @latest

@latest asks the CLI to resolve the served gateway model from Control Plane inventory before sending the gateway request.
tm auth login
tm init --sync
tm infer chat --model @latest --json '[{"role":"user","content":"Say hello."}]'
You still need a valid gateway API key and X-User-Id even when @latest is used, so keep the normal tm init --sync step in place before this shortcut.