vllm 运行DeepSeek-R1-7B

本示范使用vllm，只支持linux环境

创建python环境

sudo mkdir -p /home/deepseek 创建目录
sudo chown -R $USER:$USER /home/deepseek 修改目录权限，防止后续权限不够，因为python虚拟环境需要写权限
cd /home/deepseek 打开目录

sudo apt install -y python3-venv 安装虚拟环境(如果已经有就不用执行)

sudo python3 -m venv venv 创建环境

source venv/bin/activate 激活虚拟环境

国内使用阿里云源安装vllm,注意不要使用sudo，加了sudo会跳出虚拟环境
pip install vllm -i https://mirrors.aliyun.com/pypi/simple/ 安装vllm(只能linux，windows系统不可用)

下载项目

安装git
sudo apt install git
git --version

如果系统没有git-lfs需要安装
sudo apt install git-lfs 安装
git lfs install 启用

国内网络魔达社区下载DeepSeek-R1-Distill-Qwen-7B
sudo git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git

如果大文件没有下载则需要进入git项目目录后执行
cd DeepSeek-R1-Distill-Qwen-7B
git lfs pull
cd ..

streamlit 快速搭建一个对话界面:

pip install streamlit -i https://mirrors.aliyun.com/pypi/simple/运行时确保在虚拟环境

创建一个运行文件streamlit_run.py：

import streamlit as st
import requests

st.title("deepseek测试")

user_input = st.text_input("输入你的问题:")

if user_input:
    url = "http://127.0.0.1:8000/v1/chat/completions"  # API 路径
    headers = {"Content-Type": "application/json"}
    data = {
        "model": "./DeepSeek-R1-Distill-Qwen-7B",  # 模型名称
        "messages": [{"role": "user", "content": user_input}]  # 正确的请求体格式
    }
    
    # 发送请求
    response = requests.post(url, headers=headers, json=data)

    # 处理返回结果
    if response.status_code == 200:
        output = response.json().get("choices", [{}])[0].get("message", {}).get("content", "没有返回结果")
        st.write("模型回答:", output)
    else:
        st.write(f"请求失败，状态码: {response.status_code}")
        st.write("响应内容:", response.text)

vllm运行

vllm默认支持多卡服务，原模型是使用28个通道训练,你使用的卡必须是28的因数，1或4或7张卡
我的服务器是6张卡，所以size参数我写 4
vllm serve ./DeepSeek-R1-Distill-Qwen-7B --tensor-parallel-size 4 --max-model-len 32768 --enforce-eager

此时它已经运行在8000端口

运行web界面的时候需使用命令行启动，无法直接运行py文件
streamlit run streamlit_run.py 运行时确保在虚拟环境，且注意路径
运行后我这里显示访问端口为8501，如果你的服务器8501对外可访问，那么此时它已经能通过你服务器对外访问这个web界面开启对话了