NVIDIA Triton Inference Server Documentation, NVIDIA Corporation, 2023 - Official documentation for NVIDIA Triton Inference Server, detailing its architecture, features, and configuration for high-performance model deployment.
Accelerate Inference with Dynamic Batching on NVIDIA Triton Inference Server, Andrew P. Kim, 2021NVIDIA Developer Blog (NVIDIA) - This blog post from NVIDIA explains how dynamic batching functions within Triton to optimize GPU utilization and throughput for inference workloads, especially beneficial for LLMs.