models

Featured

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

About

A high-throughput and memory-efficient inference and serving engine for LLMs

Topics

amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference kimi llama llm llm-serving model-serving moe openai pytorch qwen qwen3 tpu transformer

GitHub Stats

Stars75.8k

Forks15.3k

Watchers0

Open Issues0

Details

LanguagePython

LicenseApache-2.0

Deploymentboth

StatusActive

Repositoryvllm-project/vllm

Last push4/9/2026

Related Projects

transformers159.1k

llama.cpp102.6k