Skip to main content

Overview

The Model Chatbot interface allows you to interact with your deployed models in real-time and run comprehensive performance tests.

Getting Started

  1. Navigate to the Model Chatbot interface in Quick Actions
  2. Select your active deployment from the dropdown menu
  3. Start chatting with your model or configure performance tests
Model Chatbot Interface

Interface Tabs

The Model Chatbot interface includes four main tabs:
  • Chat - Interactive conversation with your deployed model
  • Performance - Load testing and performance metrics (BETA)
  • Test Results - View historical test results (BETA)
  • Metrics - Detailed performance analytics

Chat Interface

Old Models do not always support the “chat” function. Before using our chatbot, verify that your LLM supports “chat”.
The right panel displays essential deployment details:
FieldDescription
Model NameThe identifier for your deployed model
Model PathPath to the model in your repository
EndpointThe API endpoint URL for the deployment
Deployment IDUnique identifier for the deployment instance
StatusCurrent deployment status (Active/Inactive)
  1. Ensure your deployment status shows as Active (green indicator)
  2. Type your message in the input field at the bottom
  3. Click the send button or press Enter
  4. View the model’s response in the chat area

Performance Testing

BETA FEATURE: Performance testing is currently in beta. Features and metrics may be subject to change based on user feedback and ongoing improvements.
Performance Testing Interface
Test Name (Optional)
  • Provide a descriptive name for your test
  • Auto-generated if left empty
  • Examples: “Baseline test”, “Load test - 100 users”
Concurrent Users
  • Range: 1 - 500 users
  • Simulates parallel requests to your model
  • Default: 100 concurrent users
Test Duration
  • Range: 10 - 150 seconds
  • Duration the test will run continuously
  • Default: 20 seconds
Model Parameters (Collapsible)
  • Configure additional model-specific settings
  1. Navigate to the Performance tab
  2. Configure your test parameters
  3. Click Start Performance Test (green button)
  4. Monitor live metrics in the right panel
Tests run continuously for the configured duration while maintaining the specified concurrency level. Total requests will vary based on your model’s response time.
During test execution, the Live Metrics panel displays:
  • TTFT (Time To First Token) - Latency until the first token is generated
  • ITL (Inter-Token Latency) - Time between subsequent tokens
  • Cache Hit Rate - Percentage of requests served from cache
  • Throughput - Requests processed per second
Performance testing simulates parallel streaming requests to measure these key metrics. The system supports up to 500 concurrent users and adapts total request count based on model response time.

Test Results

BETA FEATURE: Test Results tracking is currently in beta. The format and available metrics may evolve as we refine this feature.
The Test Results tab stores historical performance test data, allowing you to:
  • Compare performance across different test runs
  • Track improvements or regressions
  • Analyze trends over time