Overview
The Model Chatbot interface allows you to interact with your deployed models in real-time and run comprehensive performance tests.Getting Started
- Navigate to the Model Chatbot interface in Quick Actions
- Select your active deployment from the dropdown menu
- Start chatting with your model or configure performance tests

Interface Tabs
The Model Chatbot interface includes four main tabs:- Chat - Interactive conversation with your deployed model
- Performance - Load testing and performance metrics (BETA)
- Test Results - View historical test results (BETA)
- Metrics - Detailed performance analytics
Chat Interface
Connection Information
Connection Information
The right panel displays essential deployment details:
| Field | Description |
|---|---|
| Model Name | The identifier for your deployed model |
| Model Path | Path to the model in your repository |
| Endpoint | The API endpoint URL for the deployment |
| Deployment ID | Unique identifier for the deployment instance |
| Status | Current deployment status (Active/Inactive) |
Starting a Conversation
Starting a Conversation
- Ensure your deployment status shows as Active (green indicator)
- Type your message in the input field at the bottom
- Click the send button or press Enter
- View the model’s response in the chat area
Performance Testing
BETA FEATURE: Performance testing is currently in beta. Features and metrics may be subject to change based on user feedback and ongoing improvements.

Test Configuration Parameters
Test Configuration Parameters
Test Name (Optional)
- Provide a descriptive name for your test
- Auto-generated if left empty
- Examples: “Baseline test”, “Load test - 100 users”
- Range: 1 - 500 users
- Simulates parallel requests to your model
- Default: 100 concurrent users
- Range: 10 - 150 seconds
- Duration the test will run continuously
- Default: 20 seconds
- Configure additional model-specific settings
Running a Performance Test
Running a Performance Test
- Navigate to the Performance tab
- Configure your test parameters
- Click Start Performance Test (green button)
- Monitor live metrics in the right panel
Tests run continuously for the configured duration while maintaining the specified concurrency level. Total requests will vary based on your model’s response time.
Understanding Live Metrics
Understanding Live Metrics
During test execution, the Live Metrics panel displays:
- TTFT (Time To First Token) - Latency until the first token is generated
- ITL (Inter-Token Latency) - Time between subsequent tokens
- Cache Hit Rate - Percentage of requests served from cache
- Throughput - Requests processed per second
Test Results
BETA FEATURE: Test Results tracking is currently in beta. The format and available metrics may evolve as we refine this feature.
- Compare performance across different test runs
- Track improvements or regressions
- Analyze trends over time

