Serverless Chat Completions - Tensormesh User Documentation

Create Chat Completion

curl --request POST \
  --url https://serverless.tensormesh.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "MiniMaxAI/MiniMax-M2.5",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}
'

import requests

url = "https://serverless.tensormesh.ai/v1/chat/completions"

payload = {
    "model": "MiniMaxAI/MiniMax-M2.5",
    "messages": [
        {
            "role": "user",
            "content": "Hello, how are you?"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

const options = {
  method: 'POST',
  headers: {Authorization: 'Bearer <token>', 'Content-Type': 'application/json'},
  body: JSON.stringify({
    model: 'MiniMaxAI/MiniMax-M2.5',
    messages: [{role: 'user', content: 'Hello, how are you?'}]
  })
};

fetch('https://serverless.tensormesh.ai/v1/chat/completions', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "id": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "content": "<string>",
        "refusal": "<string>",
        "annotations": [
          {}
        ],
        "audio": {},
        "function_call": {},
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "id": "<string>"
          }
        ],
        "reasoning": "<string>"
      },
      "finish_reason": "<string>",
      "logprobs": {},
      "raw_output": {},
      "stop_reason": "<string>",
      "token_ids": [
        123
      ]
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 123,
    "total_tokens": 123,
    "completion_tokens": 123,
    "prompt_tokens_details": {}
  },
  "perf_metrics": {},
  "prompt_token_ids": [
    123
  ]
}

{
  "error": "Invalid request: missing 'model' in request body."
}

{
  "error": "Unauthorized"
}

{
  "error": {
    "message": "<string>",
    "type": "<string>",
    "param": "<string>",
    "code": "<string>"
  }
}

{
  "error": {
    "message": "<string>",
    "type": "<string>",
    "param": "<string>",
    "code": "<string>"
  }
}

POST

https://serverless.tensormesh.ai

chat

completions

Create Chat Completion

curl --request POST \
  --url https://serverless.tensormesh.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "MiniMaxAI/MiniMax-M2.5",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}
'

import requests

url = "https://serverless.tensormesh.ai/v1/chat/completions"

payload = {
    "model": "MiniMaxAI/MiniMax-M2.5",
    "messages": [
        {
            "role": "user",
            "content": "Hello, how are you?"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

const options = {
  method: 'POST',
  headers: {Authorization: 'Bearer <token>', 'Content-Type': 'application/json'},
  body: JSON.stringify({
    model: 'MiniMaxAI/MiniMax-M2.5',
    messages: [{role: 'user', content: 'Hello, how are you?'}]
  })
};

fetch('https://serverless.tensormesh.ai/v1/chat/completions', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "id": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "content": "<string>",
        "refusal": "<string>",
        "annotations": [
          {}
        ],
        "audio": {},
        "function_call": {},
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            },
            "id": "<string>"
          }
        ],
        "reasoning": "<string>"
      },
      "finish_reason": "<string>",
      "logprobs": {},
      "raw_output": {},
      "stop_reason": "<string>",
      "token_ids": [
        123
      ]
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 123,
    "total_tokens": 123,
    "completion_tokens": 123,
    "prompt_tokens_details": {}
  },
  "perf_metrics": {},
  "prompt_token_ids": [
    123
  ]
}

{
  "error": "Invalid request: missing 'model' in request body."
}

{
  "error": "Unauthorized"
}

{
  "error": {
    "message": "<string>",
    "type": "<string>",
    "param": "<string>",
    "code": "<string>"
  }
}

{
  "error": {
    "message": "<string>",
    "type": "<string>",
    "param": "<string>",
    "code": "<string>"
  }
}

Use this page when you want the public serverless inference surface.

Auth: Authorization: Bearer <API_KEY>
Host: https://serverless.tensormesh.ai
Compatibility: OpenAI-compatible chat completions
Best for: the closest raw-HTTP match to an OpenAI-style chat request

If you have Control Plane access for the same Tensormesh environment, discover published serverless models with tm billing pricing serverless list and use the returned pricing[].model value in the request body. If you are targeting a different serverless host override, or you only have inference credentials, get the model name from your Tensormesh environment before sending the request. Start with Choose A Serverless Model Name if you do not already know the model name you need. Other verified serverless reference pages:

For raw request setup, see API Quickstart.

Authorizations

Authorization

string

header

required

Bearer authentication using your serverless API key. Format: Bearer <API_KEY>

Body

application/json

model

string

required

Serverless model name to use.

If you have Control Plane access for the same Tensormesh environment, discover published serverless models with tm billing pricing serverless list and use the returned pricing[].model value here. If you are targeting a different serverless host override, or you only have inference credentials, get the model name from your Tensormesh environment before sending requests.

Example:

"MiniMaxAI/MiniMax-M2.5"

messages

ChatMessage · object[]

required

A list of messages comprising the conversation so far.

Show child attributes

temperature

number | null

Sampling temperature. Higher values make the output more random.

Note: temperature=0 is greedy sampling.

top_p

number | null

Nucleus sampling. We generally recommend altering this or temperature but not both.

max_tokens

integer | null

The maximum number of tokens to generate in the completion.

If set too low, the model may hit finish_reason="length" before producing useful message.content.

max_completion_tokens

integer | null

Alternative name for max_tokens (this inference surface accepts both; if both are set, behavior is runtime-dependent).

integer | null

default:1

How many choices to generate.

Note: when using greedy sampling (temperature=0), n must be 1 (otherwise a 400 error).

stop

Stop sequence(s) where the API will stop generating further tokens.

top_k

integer | null

Top-k sampling. Filters candidates to the K most likely tokens at each step.

min_p

number | null

Minimum probability threshold for token selection (alternative to top_p / top_k).

typical_p

number | null

Typical-p sampling parameter.

seed

integer | null

Random seed for best-effort deterministic sampling (model/runtime dependent).

repetition_penalty

number | null

Applies a penalty to repeated tokens to discourage repetition.

mirostat_target

number | null

Target perplexity for Mirostat sampling (if supported).

mirostat_lr

number | null

Learning rate for Mirostat sampling (if supported).

stream

boolean | null

default:false

Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events (SSE) as they become available, with the stream terminated by a data: [DONE] message.

stream_options

StreamOptions · object | null

Streaming options. Only valid when stream=true.

Show child attributes

response_format

ResponseFormat · object | null

Allows forcing the model to produce a specific output format.

Supported values:

{ "type": "json_object" } (JSON mode)
{ "type": "text" }

Note: extra keys inside response_format are rejected by this inference surface.

Show child attributes

tools

ChatCompletionTool · object[] | null

A list of tools the model may call. Currently, only functions are supported as a tool.

Show child attributes

tool_choice

default:auto

Controls which (if any) tool is called by the model.

none: the model will not call any tool and instead generates a message.
auto: the model can pick between generating a message or calling tools.
required: the model is instructed to call one or more tools (model/runtime dependent).

Available options:

auto,

none,

required

parallel_tool_calls

boolean | null

Enable parallel tool/function calling (if supported).

presence_penalty

number | null

Penalizes new tokens based on whether they appear in the text so far.

frequency_penalty

number | null

Penalizes new tokens based on their existing frequency in the text so far.

logprobs

boolean | null

Include per-token log probabilities in the response (when supported by the model/runtime).

top_logprobs

integer | null

Number of most likely tokens to return at each position (requires logprobs=true).

logit_bias

Logit Bias · object | null

Modify the likelihood of specified tokens appearing in the completion.

Maps token id (string) to bias (number).

Show child attributes

user

string | null

A unique identifier representing your end-user.

metadata

Metadata · object | null

Additional metadata to store with the request for tracing.

prompt_truncate_len

integer | null

Truncate chat prompts (in tokens) to this length by evicting older messages first.

context_length_exceeded_behavior

enum<string> | null

default:truncate

What to do when prompt plus max tokens exceeds context window.

Available options:

truncate,

error

return_token_ids

boolean | null

default:false

Return token IDs alongside text (populates choices[].token_ids).

prompt_cache_isolation_key

string | null

Isolation key for prompt caching to separate cache entries (if supported).

raw_output

boolean | null

default:false

Return raw output from the model.

Note: support is model/runtime dependent. Some deployments may return an error when enabled.

perf_metrics_in_response

boolean | null

default:false

Whether to include performance metrics in the response body.

Note: this inference surface may accept the field but not include any extra metrics in the response body.

echo

boolean | null

default:false

Echo back the prompt in addition to the completion.

Note: support is model/runtime dependent. Some deployments may return an error when enabled.

echo_last

integer | null

Echo back the last N tokens of the prompt (if supported).

ignore_eos

boolean | null

default:false

Whether the model should ignore the EOS token (model/runtime dependent).

Note: support is model/runtime dependent. Some deployments may return an error when enabled.

speculation

Speculative decoding prompt or token IDs (if supported).

prediction

OpenAI-compatible predicted output for speculative decoding (if supported).

Show child attributes

reasoning_effort

Controls reasoning behavior for supported models (model/runtime dependent).

Available options:

low,

medium,

high,

none

reasoning_history

enum<string> | null

Controls how historical assistant reasoning content is included in the prompt (if supported).

Available options:

disabled,

interleaved,

preserved

thinking

ThinkingConfigEnabled · object

Alternative Anthropic-compatible config for reasoning (if supported).

ThinkingConfigEnabled
ThinkingConfigDisabled

Show child attributes

functions

ChatCompletionFunction · object[] | null

Deprecated (OpenAI). Use tools instead.

Show child attributes

function_call

Deprecated (OpenAI). Use tool_choice instead.

Available options:

auto,

none

Response

Successful Response

string

required

A unique identifier of the response.

created

integer

required

The Unix time in seconds when the response was generated.

model

string

required

The model used for the chat completion.

choices

ChatCompletionResponseChoice · object[]

required

The list of chat completion choices.

Show child attributes

object

string

default:chat.completion

The object type, which is always "chat.completion".

usage

UsageInfo · object | null

Show child attributes

perf_metrics

Perf Metrics · object | null

Optional performance metrics (if enabled/supported).

prompt_token_ids

integer[] | null

Optional prompt token ids (when enabled/supported).

Version

Serverless Models

⌘I