Feature Support

Feature Support#

The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.

You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:

Feature

vLLM V0 Engine

vLLM V1 Engine

Next Step

Chunked Prefill

🟢 Functional

🟢 Functional

Functional, see detail note: Chunked Prefill

Automatic Prefix Caching

🟢 Functional

🟢 Functional

Functional, see detail note: vllm-ascend#732

LoRA

🟢 Functional

🟢 Functional

vllm-ascend#396, vllm-ascend#893

Prompt adapter

🔴 No plan

🔴 No plan

This feature has been deprecated by vllm.

Speculative decoding

🟢 Functional

🟢 Functional

Basic support

Pooling

🟢 Functional

🟡 Planned

CI needed and adapting more models; V1 support rely on vLLM support.

Enc-dec

🔴 NO plan

🟡 Planned

Plan in 2025.06.30

Multi Modality

🟢 Functional

🟢 Functional

Tutorial, optimizing and adapting more models

LogProbs

🟢 Functional

🟢 Functional

CI needed

Prompt logProbs

🟢 Functional

🟢 Functional

CI needed

Async output

🟢 Functional

🟢 Functional

CI needed

Multi step scheduler

🟢 Functional

🔴 Deprecated

vllm#8779, replaced by vLLM V1 Scheduler

Best of

🟢 Functional

🔴 Deprecated

vllm#13361, CI needed

Beam search

🟢 Functional

🟢 Functional

CI needed

Guided Decoding

🟢 Functional

🟢 Functional

vllm-ascend#177

Tensor Parallel

🟢 Functional

🟢 Functional

CI needed

Pipeline Parallel

🟢 Functional

🟢 Functional

CI needed

Expert Parallel

🔴 NO plan

🟢 Functional

CI needed; No plan on V0 support

Data Parallel

🔴 NO plan

🟢 Functional

CI needed; No plan on V0 support

Prefill Decode Disaggregation

🟢 Functional

🟢 Functional

1P1D available, working on xPyD and V1 support.

Quantization

🟢 Functional

🟢 Functional

W8A8 available, CI needed; working on more quantization method support

Graph Mode

🔴 NO plan

🔵 Experimental

Experimental, see detail note: vllm-ascend#767

Sleep Mode

🟢 Functional

🟢 Functional

level=1 available, CI needed, working on V1 support

  • 🟢 Functional: Fully operational, with ongoing optimizations.

  • 🔵 Experimental: Experimental support, interfaces and functions may change.

  • 🚧 WIP: Under active development, will be supported soon.

  • 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).

  • 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1.