models
Featured

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

About

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub Stats

Stars75.8k
Forks15.3k
Watchers0
Open Issues0

Details

LanguagePython
LicenseApache-2.0
Deploymentboth
StatusActive
Last push4/9/2026