Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensormesh.ai/llms.txt

Use this file to discover all available pages before exploring further.

This page covers the standard first-request flow against the serverless inference surface. If tm is not already on your PATH, install the CLI first with Installation. The commands below assume tm is on your PATH. If you are running from this repo checkout without activating a shell that already exposes tm, use ./.venv/bin/tm. The shortest serverless request:
tm infer chat \
  --api-key YOUR_INFERENCE_API_KEY \
  --model YOUR_SERVERLESS_MODEL_NAME \
  --json '[{"role":"user","content":"Say hello."}]'
If you do not already know a valid serverless model name, use tm billing pricing serverless list before using that shortcut. Use tm infer doctor when you want to check whether the local config and credentials are wired up correctly before sending the request.

Get An API Key

If your environment exposes self-serve key creation, use the authenticated Control Plane flow:
tm auth login
USER_ID="$(tm --output json auth whoami | python3 -c 'import json,sys; print(json.load(sys.stdin)["user"]["id"])')"
tm users api-keys create --user-id "$USER_ID" --name cli-key --yes
Otherwise, ask your operator or admin for the exact inference API key to use.

Send A Chat Request

tm infer chat \
  --api-key YOUR_INFERENCE_API_KEY \
  --model YOUR_SERVERLESS_MODEL_NAME \
  --json '[{"role":"user","content":"Say hello."}]'
For streaming output:
echo '[{"role":"user","content":"Stream tokens."}]' \
  | tm infer chat --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --stream
Streaming requests use a bounded idle read timeout. If you expect long quiet gaps between SSE events, raise it explicitly:
echo '[{"role":"user","content":"Stream tokens."}]' \
  | tm infer chat --api-key YOUR_INFERENCE_API_KEY --model YOUR_SERVERLESS_MODEL_NAME --stream --stream-idle-timeout 600