Deploy Model

📹 See demo video

1. Create a Cluster

Note

Provisioning resources can take time. You may need to wait up to 20 minutes for Lambda to provide GPUs.

  • In the left sidebar, click Cluster, then hit + Create Cluster.

  • Fill in Cluster Configuration:

    • Name (e.g. test)

    • Cloud Provider (e.g. Lambda Labs)

    • Region (e.g. us-south-1)

    • GPU Type & Count (e.g. 8 × H100)

    • Hugging Face Token (paste your HF access token)

  • Click Create Cluster at the bottom right.

  • You will see an info card containing status progression: Pending → init → wait_k8s → Active. Wait until the status shows Active.

Note

If it turns Fail to create, the instance wasn’t available.

  • Delete the cluster in the web interface (no need to delete the instance from the Lambda Labs dashboard).

  • Change the configuration and try again. Most often, switching to a different region helps.

2. Create Deployments

Note

The deployment process itself could take up to 10 minutes to complete.

  • In the left sidebar, click Deployments, then hit + Create Deployment.

  • Search or select from existing model cards, pick the model you want to deploy (e.g. meta-llama/Llama-3.1-8B-Instruct).

  • Configure basics

    • Deployment Name: give it a descriptive name (e.g. llama-8b-test).

    • Target Cluster: select one of your Active clusters.

    • The UI will auto-detect available GPUs and memory in that cluster.

  • Skip—or dive into—Advanced

    • To quick-start, click Create Deployment now.

    • For finer control, click Next: Advanced. Advanced settings are grouped in three tabs:

      • 🧠 LM Cache

        • CPU/Disk Offloading Buffer Size, P/D Disaggregation, CacheBlend, etc.

      • 🤖 Model

        • Max Model Length, Max Number of Sequences, Dtype, etc.

      • ⚡️ vLLM

        • TP Size, GPU Memory Utilization, Enable Chunked Prefill, etc.

  • Launch!

    • Once you click Create Deployment, you’ll see an info card for your deployment containing status progression.

    • If it fails, check logs.

Tips