When To Use This
Usetm infer responses when you want the /v1/responses endpoint from the CLI on the selected surface.
Provide the request body as JSON and either include model in the body or pass --model to override it.
Usage
Examples
Send a serverless responses request.
Send a routed On-Demand responses request.
Stream response output text over SSE and fail a stalled stream after 30 seconds of upstream silence.
Options
| Name | Type | Required | Default | Details | |
|---|---|---|---|---|---|
--surface | `choice[on-demand | serverless]` | no | "serverless" | Inference surface to target. |
--model | text | no | Model name to use. Overrides model in the JSON request body. | ||
--user-id | text | no | X-User-Id header to send. Only used for —surface on-demand. | ||
--api-key | text | no | Inference API key (Authorization: Bearer …). | ||
--base-url | text | no | Override the base URL for the selected surface. | ||
--stream | boolean | no | false | Stream tokens via SSE. Boolean flag. | |
--stream-idle-timeout | float | no | 300.0 | Maximum idle read timeout in seconds for —stream responses. | |
--json | text | no | JSON request body or @file.json. When omitted, reads piped stdin if available. | ||
--file | path | no | Read JSON request body from file. | ||
--timeout | float | no | HTTP connect timeout in seconds for the inference request. |
Inherited Global Options
| Name | Type | Required | Default | Details | ||||
|---|---|---|---|---|---|---|---|---|
--version, -V | boolean | no | false | Show the version and exit. Boolean flag. | ||||
--config | path | no | "~/.config/tensormesh/config.toml" | Path to config TOML file | ||||
--output | `choice[text | json | yaml | raw | table]` | no | "text" | Output format (text is human-readable; json is machine-friendly). |
--quiet | boolean | no | false | Suppress non-essential output. Boolean flag. | ||||
--debug | boolean | no | false | Print debug logs to stderr (secrets redacted). Boolean flag. | ||||
--ca-bundle | path | no | Path to a PEM CA bundle for TLS verification (overrides TENSORMESH_CA_BUNDLE). | |||||
--max-retries | integer | no | Max retries for idempotent HTTP requests on transient errors (overrides TENSORMESH_MAX_RETRIES; subcommands may override). | |||||
--controlplane-base | text | no | Override the Control Plane base URL. | |||||
--gateway-provider | text | no | Inference Gateway provider for built-in host selection (nebius, lambda, yotta). |
Auth Scope
- inference-api-key
Caveats
- On-Demand also requires
--user-idand a routed base URL. --modeloverrides themodelfield inside the JSON request body.--streamcurrently supports only--output text.

