Overview
vLLM is a high-performance open-source framework for LLM inference, improving throughput and memory efficiency with mechanisms such as PagedAttention for self-hosted model serving.
Features
- PagedAttention
- OpenAI-compatible API
- Optimized batched inference