Deploy Model
============
📹 See demo video
.. raw:: html
1. Create a Cluster
-------------------
.. note::
Provisioning resources can take time. You may need to wait up to **20 minutes** for Lambda to provide GPUs.
* In the left sidebar, click **Cluster**, then hit **+ Create Cluster**.
* Fill in Cluster Configuration:
* Name (e.g. test)
* Cloud Provider (e.g. Lambda Labs)
* Region (e.g. us-south-1)
* GPU Type & Count (e.g. 8 Ă— H100)
* Hugging Face Token (paste your HF access token)
* Click **Create Cluster** at the bottom right.
* You will see an info card containing status progression: **Pending → init → wait_k8s → Active**. Wait until the status shows **Active**.
.. note::
If it turns **Fail to create**, the instance wasn't available.
* Delete the cluster in the web interface (no need to delete the instance from the Lambda Labs dashboard).
* Change the configuration and try again. Most often, switching to a different region helps.
2. Create Deployments
---------------------
.. note::
The deployment process itself could take up to **10 minutes** to complete.
* In the left sidebar, click **Deployments**, then hit **+ Create Deployment**.
* Search or select from existing model cards, pick the model you want to deploy (e.g. meta-llama/Llama-3.1-8B-Instruct).
* Configure basics
* Deployment Name: give it a descriptive name (e.g. llama-8b-test).
* Target Cluster: select one of your Active clusters.
* The UI will auto-detect available GPUs and memory in that cluster.
* Skip—or dive into—Advanced
* To quick-start, click **Create Deployment** now.
* For finer control, click **Next: Advanced**. Advanced settings are grouped in three tabs:
* đź§ LM Cache
* CPU/Disk Offloading Buffer Size, P/D Disaggregation, CacheBlend, etc.
* 🤖 Model
* Max Model Length, Max Number of Sequences, Dtype, etc.
* ⚡️ vLLM
* TP Size, GPU Memory Utilization, Enable Chunked Prefill, etc.
* Launch!
* Once you click **Create Deployment**, you'll see an info card for your deployment containing status progression.
* If it fails, check logs.
Tips
----