Skip to main content

When To Use This

Use tm infer responses when you want the /v1/responses endpoint from the CLI on the selected surface. Provide the request body as JSON and either include model in the body or pass --model to override it.

Usage

tm infer responses [OPTIONS]

Examples

Send a serverless responses request.

tm infer responses --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --json '{"input":"Say hello."}'

Send a routed On-Demand responses request.

tm infer responses --surface on-demand --api-key YOUR_INFERENCE_API_KEY --user-id YOUR_TENSORMESH_USER_ID --base-url https://external.nebius.tensormesh.ai --model YOUR_SERVED_GATEWAY_MODEL_NAME --json '{"input":"Say hello."}'

Stream response output text over SSE and fail a stalled stream after 30 seconds of upstream silence.

tm infer responses --stream --stream-idle-timeout 30 --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --json '{"input":"Stream a short reply."}'

Options

NameTypeRequiredDefaultDetails
--surface`choice[on-demandserverless]`no"serverless"Inference surface to target.
--modeltextnoModel name to use. Overrides model in the JSON request body.
--user-idtextnoX-User-Id header to send. Only used for —surface on-demand.
--api-keytextnoInference API key (Authorization: Bearer …).
--base-urltextnoOverride the base URL for the selected surface.
--streambooleannofalseStream tokens via SSE. Boolean flag.
--stream-idle-timeoutfloatno300.0Maximum idle read timeout in seconds for —stream responses.
--jsontextnoJSON request body or @file.json. When omitted, reads piped stdin if available.
--filepathnoRead JSON request body from file.
--timeoutfloatnoHTTP connect timeout in seconds for the inference request.

Inherited Global Options

NameTypeRequiredDefaultDetails
--version, -VbooleannofalseShow the version and exit. Boolean flag.
--configpathno"~/.config/tensormesh/config.toml"Path to config TOML file
--output`choice[textjsonyamlrawtable]`no"text"Output format (text is human-readable; json is machine-friendly).
--quietbooleannofalseSuppress non-essential output. Boolean flag.
--debugbooleannofalsePrint debug logs to stderr (secrets redacted). Boolean flag.
--ca-bundlepathnoPath to a PEM CA bundle for TLS verification (overrides TENSORMESH_CA_BUNDLE).
--max-retriesintegernoMax retries for idempotent HTTP requests on transient errors (overrides TENSORMESH_MAX_RETRIES; subcommands may override).
--controlplane-basetextnoOverride the Control Plane base URL.
--gateway-providertextnoInference Gateway provider for built-in host selection (nebius, lambda, yotta).

Auth Scope

  • inference-api-key

Caveats

  • On-Demand also requires --user-id and a routed base URL.
  • --model overrides the model field inside the JSON request body.
  • --stream currently supports only --output text.

Parent Command