Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

When To Use This

Use tm infer chat when you want to send an actual serverless inference request through the CLI.

Usage

tm infer chat [OPTIONS]

Examples

Send a non-streaming serverless request with an explicit inference API key.

tm infer chat --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --json '[{"role":"user","content":"Say hello."}]'

Stream tokens over SSE and fail a stalled stream after 30 seconds of upstream silence.

echo '[{"role":"user","content":"Stream tokens."}]' | tm infer chat --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --stream --stream-idle-timeout 30

Options

NameTypeRequiredDefaultDetails
--surfacechoice[serverless]no"serverless"Inference surface to use.
--modeltextnoModel name to use.
--api-keytextnoInference API key (Authorization: Bearer …).
--base-urltextnoOverride the base URL for the selected surface.
--streambooleannofalseStream tokens via SSE. Boolean flag.
--stream-idle-timeoutfloatno300.0Maximum idle read timeout in seconds for —stream responses.
--jsontextnoJSON payload or @file.json (object or messages array). When omitted, reads piped stdin if available.
--filepathnoRead JSON payload/messages from file.
--timeoutfloatnoHTTP connect timeout in seconds for the inference request.

Inherited Global Options

NameTypeRequiredDefaultDetails
--version, -VbooleannofalseShow the version and exit. Boolean flag.
--configpathno"~/.config/tensormesh/config.toml"Path to config TOML file
--output`choice[textjsonyamlrawtable]`no"text"Output format (text is human-readable; json is machine-friendly).
--quietbooleannofalseSuppress non-essential output. Boolean flag.
--debugbooleannofalsePrint debug logs to stderr (secrets redacted). Boolean flag.
--ca-bundlepathnoPath to a PEM CA bundle for TLS verification (overrides TENSORMESH_CA_BUNDLE).
--max-retriesintegernoMax retries for idempotent HTTP requests on transient errors (overrides TENSORMESH_MAX_RETRIES; subcommands may override).
--controlplane-basetextnoOverride the Control Plane base URL.

Auth Scope

  • inference-api-key

Prerequisites

  • Provide --model explicitly or include model in the JSON request body.
  • If you have Control Plane access for the same Tensormesh environment, discover published serverless model names with tm billing pricing serverless list, then pass the returned pricing[].model value with --model.
  • If you only have inference credentials, or you are targeting a different serverless host override, get the model name from your Tensormesh environment before sending the request.

Caveats

  • Reuses gateway_api_key from config.toml as the inference API key when --api-key is omitted.
  • gateway_api_key is the stored inference API key used by the SDK as inference_api_key.
  • tm billing pricing serverless list helps discover published serverless model names for the current Tensormesh Control Plane environment. If you are targeting a different serverless host override, confirm the model name for that host separately.
  • --stream currently supports only --output text.

Parent Command