Documentation Index
Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt
Use this file to discover all available pages before exploring further.
When To Use This
Usetm infer chat when you want to send an actual inference request through the CLI.
- Choose
--surface serverlesswhen you already know the serverless model name you want. - Use the default On-Demand flow after
tm auth loginandtm init --syncwhen you want the CLI to reuse the synced managed gateway settings.
Usage
Examples
Send a non-streaming On-Demand chat request after tm auth login and tm init --sync.
Send a non-streaming Serverless request with an explicit inference API key.
Stream tokens from the selected surface after the usual On-Demand setup, and fail a stalled stream after 30 seconds of upstream silence.
Options
| Name | Type | Required | Default | Details | |
|---|---|---|---|---|---|
--surface | `choice[on-demand | serverless]` | no | "on-demand" | Inference surface to use. |
--model | text | no | Model name to use. | ||
--user-id | text | no | X-User-Id header (UUID). Only used for —surface on-demand. | ||
--api-key | text | no | Inference API key (Authorization: Bearer …). | ||
--base-url | text | no | Override the base URL for the selected surface. | ||
--stream | boolean | no | false | Stream tokens via SSE. Boolean flag. | |
--stream-idle-timeout | float | no | 300.0 | Maximum idle read timeout in seconds for —stream responses. | |
--json | text | no | JSON payload or @file.json (object or messages array). When omitted, reads piped stdin if available. | ||
--file | path | no | Read JSON payload/messages from file. | ||
--timeout | float | no | HTTP connect timeout in seconds for the inference request. |
Inherited Global Options
| Name | Type | Required | Default | Details | ||||
|---|---|---|---|---|---|---|---|---|
--version, -V | boolean | no | false | Show the version and exit. Boolean flag. | ||||
--config | path | no | "~/.config/tensormesh/config.toml" | Path to config TOML file | ||||
--output | `choice[text | json | yaml | raw | table]` | no | "text" | Output format (text is human-readable; json is machine-friendly). |
--quiet | boolean | no | false | Suppress non-essential output. Boolean flag. | ||||
--debug | boolean | no | false | Print debug logs to stderr (secrets redacted). Boolean flag. | ||||
--ca-bundle | path | no | Path to a PEM CA bundle for TLS verification (overrides TENSORMESH_CA_BUNDLE). | |||||
--max-retries | integer | no | Max retries for idempotent HTTP requests on transient errors (overrides TENSORMESH_MAX_RETRIES; subcommands may override). | |||||
--controlplane-base | text | no | Override the Control Plane base URL. | |||||
--gateway-provider | text | no | Inference Gateway provider for built-in host selection (nebius, lambda, yotta). |
Auth Scope
- inference-api-key
Prerequisites
- For the default On-Demand flow, run
tm auth loginandtm init --syncfirst so the CLI has the synced inference API key, X-User-Id, and served model name. - Provide
--modelwhen using--surface serverless. - If you have Control Plane access for the same Tensormesh environment, discover published serverless model names with
tm billing pricing serverless list, then pass the returnedpricing[].modelvalue with--model. - If you only have inference credentials, or you are targeting a different serverless host override, get the model name from your Tensormesh environment before using
--surface serverless.
Caveats
--surface on-demandrequires an inference API key and X-User-Id; the standard way to populate both istm auth loginfollowed bytm init --sync.--surface serverlessdoes not send X-User-Id, defaults to https://serverless.tensormesh.ai, and reusesgateway_api_keyas the shared inference API key when--api-keyis omitted.gateway_api_keyis the stored inference API key used by the SDK asinference_api_key.tm billing pricing serverless listhelps discover published serverless model names for the current Tensormesh Control Plane environment. If you are targeting a different serverless host override, confirm the model name for that host separately.--streamcurrently supports only--output text.--model @latestis only supported for--surface on-demand.

