When To Use This
Usetm infer chat when you want to send an actual inference request through the CLI.
- Choose
--surface serverlesswhen you already know the serverless model name you want. - Use the default On-Demand flow after
tm auth loginandtm init --syncwhen you want the CLI to reuse the synced managed gateway settings.
Usage
Examples
Send a non-streaming On-Demand chat request after tm auth login and tm init --sync.
Send a non-streaming Serverless request with an explicit inference API key.
Stream tokens from the selected surface after the usual On-Demand setup, and fail a stalled stream after 30 seconds of upstream silence.
Options
| Name | Type | Required | Default | Details | |
|---|---|---|---|---|---|
--surface | `choice[on-demand | serverless]` | no | "on-demand" | Inference surface to use. |
--model | text | no | Model name to use. | ||
--user-id | text | no | X-User-Id header (UUID). Only used for —surface on-demand. | ||
--api-key | text | no | Inference API key (Authorization: Bearer …). | ||
--base-url | text | no | Override the base URL for the selected surface. | ||
--stream | boolean | no | false | Stream tokens via SSE. Boolean flag. | |
--stream-idle-timeout | float | no | 300.0 | Maximum idle read timeout in seconds for —stream responses. | |
--json | text | no | JSON payload or @file.json (object or messages array). When omitted, reads piped stdin if available. | ||
--file | path | no | Read JSON payload/messages from file. | ||
--timeout | float | no | HTTP connect timeout in seconds for the inference request. |
Inherited Global Options
| Name | Type | Required | Default | Details | ||||
|---|---|---|---|---|---|---|---|---|
--version, -V | boolean | no | false | Show the version and exit. Boolean flag. | ||||
--config | path | no | "~/.config/tensormesh/config.toml" | Path to config TOML file | ||||
--output | `choice[text | json | yaml | raw | table]` | no | "text" | Output format (text is human-readable; json is machine-friendly). |
--quiet | boolean | no | false | Suppress non-essential output. Boolean flag. | ||||
--debug | boolean | no | false | Print debug logs to stderr (secrets redacted). Boolean flag. | ||||
--ca-bundle | path | no | Path to a PEM CA bundle for TLS verification (overrides TENSORMESH_CA_BUNDLE). | |||||
--max-retries | integer | no | Max retries for idempotent HTTP requests on transient errors (overrides TENSORMESH_MAX_RETRIES; subcommands may override). | |||||
--controlplane-base | text | no | Override the Control Plane base URL. | |||||
--gateway-provider | text | no | Inference Gateway provider for built-in host selection (nebius, lambda, yotta). |
Auth Scope
- inference-api-key
Prerequisites
- For the default On-Demand flow, run
tm auth loginandtm init --syncfirst so the CLI has the synced inference API key, X-User-Id, and served model name. - Provide
--modelwhen using--surface serverless. - If you have Control Plane access for the same Tensormesh environment, discover published serverless model names with
tm billing pricing serverless list, then pass the returnedpricing[].modelvalue with--model. - If you only have inference credentials, or you are targeting a different serverless host override, get the model name from your Tensormesh environment before using
--surface serverless.
Caveats
--surface on-demandrequires an inference API key and X-User-Id; the standard way to populate both istm auth loginfollowed bytm init --sync.--surface serverlessdoes not send X-User-Id, defaults to https://serverless.tensormesh.ai, and reusesgateway_api_keyas the shared inference API key when--api-keyis omitted.gateway_api_keyis the stored inference API key used by the SDK asinference_api_key.tm billing pricing serverless listhelps discover published serverless model names for the current Tensormesh Control Plane environment. If you are targeting a different serverless host override, confirm the model name for that host separately.--streamcurrently supports only--output text.--model @latestis only supported for--surface on-demand.

