- call serverless and on-demand inference endpoints from Python, including chat completions,
models,completions,responses,tokenize,detokenize,health, andversion - work with Tensormesh control-plane resources such as models, users, billing, and support
- choose between synchronous and asynchronous clients without changing the overall API shape
tensormesh Python distribution. The tm CLI ships in the same distribution, but the SDK is the better starting point when you are building application code or services.
The public inference surface exposes chat.completions, models, completions, responses, tokenize, detokenize, health, and version on both surfaces. On the default public serverless host, models, health, and version also work without an inference API key. Embeddings and audio endpoints are not currently exposed on this SDK surface.
For the shortest first-success path, start with serverless inference and an inference API key. Use on-demand inference once you already have the routing values for your deployment.
Prerequisites
Python3.12 or newer.
Install
Main Client Surfaces
Tensormesh: synchronous client for scripts, notebooks, and request/response applicationsAsyncTensormesh: asynchronous client for services and async application stacks
client.inference.serverlessclient.inference.on_demandclient.control_plane
chat.completions, models, completions, responses, tokenize, detokenize, health, and version.

