Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

When To Use This

Use tm infer chat when you want to send an actual inference request through the CLI.
  • Choose --surface serverless when you already know the serverless model name you want.
  • Use the default On-Demand flow after tm auth login and tm init --sync when you want the CLI to reuse the synced managed gateway settings.

Usage

tm infer chat [OPTIONS]

Examples

Send a non-streaming On-Demand chat request after tm auth login and tm init --sync.

tm infer chat --json '[{"role":"user","content":"Say hello."}]'

Send a non-streaming Serverless request with an explicit inference API key.

tm infer chat --surface serverless --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --json '[{"role":"user","content":"Say hello."}]'

Stream tokens from the selected surface after the usual On-Demand setup, and fail a stalled stream after 30 seconds of upstream silence.

echo '[{"role":"user","content":"Stream tokens."}]' | tm infer chat --stream --stream-idle-timeout 30

Options

NameTypeRequiredDefaultDetails
--surface`choice[on-demandserverless]`no"on-demand"Inference surface to use.
--modeltextnoModel name to use.
--user-idtextnoX-User-Id header (UUID). Only used for —surface on-demand.
--api-keytextnoInference API key (Authorization: Bearer …).
--base-urltextnoOverride the base URL for the selected surface.
--streambooleannofalseStream tokens via SSE. Boolean flag.
--stream-idle-timeoutfloatno300.0Maximum idle read timeout in seconds for —stream responses.
--jsontextnoJSON payload or @file.json (object or messages array). When omitted, reads piped stdin if available.
--filepathnoRead JSON payload/messages from file.
--timeoutfloatnoHTTP connect timeout in seconds for the inference request.

Inherited Global Options

NameTypeRequiredDefaultDetails
--version, -VbooleannofalseShow the version and exit. Boolean flag.
--configpathno"~/.config/tensormesh/config.toml"Path to config TOML file
--output`choice[textjsonyamlrawtable]`no"text"Output format (text is human-readable; json is machine-friendly).
--quietbooleannofalseSuppress non-essential output. Boolean flag.
--debugbooleannofalsePrint debug logs to stderr (secrets redacted). Boolean flag.
--ca-bundlepathnoPath to a PEM CA bundle for TLS verification (overrides TENSORMESH_CA_BUNDLE).
--max-retriesintegernoMax retries for idempotent HTTP requests on transient errors (overrides TENSORMESH_MAX_RETRIES; subcommands may override).
--controlplane-basetextnoOverride the Control Plane base URL.
--gateway-providertextnoInference Gateway provider for built-in host selection (nebius, lambda, yotta).

Auth Scope

  • inference-api-key

Prerequisites

  • For the default On-Demand flow, run tm auth login and tm init --sync first so the CLI has the synced inference API key, X-User-Id, and served model name.
  • Provide --model when using --surface serverless.
  • If you have Control Plane access for the same Tensormesh environment, discover published serverless model names with tm billing pricing serverless list, then pass the returned pricing[].model value with --model.
  • If you only have inference credentials, or you are targeting a different serverless host override, get the model name from your Tensormesh environment before using --surface serverless.

Caveats

  • --surface on-demand requires an inference API key and X-User-Id; the standard way to populate both is tm auth login followed by tm init --sync.
  • --surface serverless does not send X-User-Id, defaults to https://serverless.tensormesh.ai, and reuses gateway_api_key as the shared inference API key when --api-key is omitted.
  • gateway_api_key is the stored inference API key used by the SDK as inference_api_key.
  • tm billing pricing serverless list helps discover published serverless model names for the current Tensormesh Control Plane environment. If you are targeting a different serverless host override, confirm the model name for that host separately.
  • --stream currently supports only --output text.
  • --model @latest is only supported for --surface on-demand.

Parent Command