TIP

Explore the key features of the Monitoring & Ops module designed for Inference Services. This overview introduces core capabilities to help users efficiently monitor, analyze, and optimize AI service operations.

Features Overview

TOC

Logging

Realtime Pod Logs
Stream logs from Replica pods associated with inference services in real time. Debug issues instantly and track service behavior across deployments.

Monitoring

Resource Monitor

CPU/Memory Utilization
Track CPU and memory usage metrics for inference services to optimize resource allocation and prevent bottlenecks.

Computing Monitor

GPU Metrics & VRAM
Monitor GPU compute utilization and video memory (VRAM) consumption to ensure efficient hardware usage for accelerated workloads.

Other Monitor

Token Throughput
Measure token processing rates to evaluate model performance and scalability.
Request Traffic Analytics
Analyze request volume, latency, and track successful/failed requests per second (QPS) to maintain service reliability and meet SLAs.