## Overview
npm install @chenchaolong/plugin-vllm1@chenchaolong/plugin-vllm1 provides a model adapter for connecting vLLM inference services to the XpertAI platform. The plugin communicates with vLLM clusters via an OpenAI-compatible API, enabling agents to invoke conversational models, embedding models, vision-enhanced models, and reranking models within a unified XpertAI agentic workflow.
- Provides the VLLMPlugin NestJS module, which automatically registers model providers, lifecycle logging, and configuration validation logic.
- Wraps vLLM's conversational/inference capabilities as XpertAI's LargeLanguageModel via VLLMLargeLanguageModel, supporting function calling, streaming output, and agent token statistics.
- Exposes VLLMTextEmbeddingModel, reusing LangChain's OpenAIEmbeddings to generate vector representations for knowledge base retrieval.
- Integrates VLLMRerankModel, leveraging the OpenAI-compatible rerank API to improve retrieval result ranking.
- Supports declaring capabilities such as vision, function calling, and streaming mode in plugin metadata, allowing flexible configuration of different vLLM deployments in the console.
``bash`
npm install @chenchaolong/plugin-vllm1
> Peer Dependencies: The host project must also provide libraries such as @xpert-ai/plugin-sdk, @nestjs/common, @metad/contracts, @langchain/openai, lodash-es, chalk, and zod. Please refer to package.json for version requirements.
1. Add the plugin package to your system dependencies and ensure it is resolvable by Node.js.
2. Before starting the service, declare the plugin in your environment variables:
`bash`
PLUGINS=@chenchaolong/plugin-vllm1
vllm
3. Add a new model provider in the XpertAI admin interface or configuration file, and select .
The form fields defined in vllm.yaml cover common deployment scenarios:
| Field | Description |
| --- | --- |
| api_key | vLLM service access token (leave blank if the service does not require authentication). |endpoint_url
| | Required. The base URL of the vLLM OpenAI-compatible API, e.g., https://vllm.example.com/v1. |endpoint_model_name
| | Specify explicitly if the model name on the server differs from the logical model name in XpertAI. |mode
| | Choose between chat or completion inference modes. |context_size
| / max_tokens_to_sample | Control the context window and generation length. |agent_though_support
| , function_calling_type, stream_function_calling, vision_support | Indicate whether the model supports agent thought exposure, function/tool calling, streaming function calling, and multimodal input, to inform UI capability hints. |stream_mode_delimiter
| | Customize the paragraph delimiter for streaming output. |
After saving the configuration, the plugin will call the validateCredentials method in the background, making a minimal request to the vLLM service to ensure the credentials are valid.
- Conversational Models: Uses ChatOAICompatReasoningModel to proxy the vLLM OpenAI API, supporting message history, function calling, and streaming output.OpenAIEmbeddings
- Embedding Models: Relies on LangChain's for knowledge base vectorization and retrieval-augmented generation.OpenAICompatibleReranker
- Reranking Models: Wraps to semantically rerank recall results.vision_support
- Vision Models: If the vLLM inference service supports multimodal (text+image) input, enable in the configuration to declare multimodal capabilities to the frontend.
From the repository root, enter the xpertai/ directory and use Nx commands to build and test:
`bash`
npx nx build @chenchaolong/plugin-vllm1
npx nx test @chenchaolong/plugin-vllm1
Build artifacts are output to dist/ by default. Jest configuration is in jest.config.ts` for writing and running unit tests.
This project follows the AGPL-3.0 License found at the root of the repository.