HomeTool Intelligence / AI Engineering

vLLM

High-throughput LLM inference and serving framework

LLM InferenceModel ServingHigh Throughput

Overview

vLLM is a high-performance open-source framework for LLM inference, improving throughput and memory efficiency with mechanisms such as PagedAttention for self-hosted model serving.

Features

  • PagedAttention
  • OpenAI-compatible API
  • Optimized batched inference

Related Companies

vLLM Project