如何开启大语言模型Vicuna本地API

FOCUS

二十多岁，是学牲

在本地服务器部署Vicuna大模型后，我们可以开启本地api，用它来开发更多软件或进行实验。开启本地api的一大好处就是不需要冒着被Openai封号的风险花钱购买ChatGPT提供的api了。

FastChat 服务器兼容 openai-python 库和 cURL 命令，可以参阅 docs/openai_api.md 。

有关安装Fastchat以及如何部署Vicuna模型的方法，可以参考 Vicuna是什么？如何在本地部署Vicuna？

实验平台如下

系统版本：Ubuntu 22.04 LTS (GNU/Linux 6.2.0-32-generic x86_64)
CPU型号：Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
GPU型号：Nvidia RTX 4090 24G
内存：128G

开启本地api的过程总体来说比较简单

Step 1：开启控制器

python3 -m fastchat.serve.controller

Step 2：模型，启动！

python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5

Step 3：启动API server

python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

以为这就结束了？这个时候你已经踩进了坑里！

前两步都是正常的，但是第三部的“--host”参数，如果你只是想要在本机调用api，那么写“localhost”是没有问题的。但是如果你希望自己电脑上运行的api server为局域网内的用户提供服务，那么这个时候务必将“--host”参数改为你的ip地址，如下

python3 -m fastchat.serve.openai_api_server --host xxx.xxx.xx.xx --port 8000

这时候才可以算是开启了完整功能的vicuna api server，如果还是不能访问，那么尝试关闭防火墙。

下面我们附上vicuna api的使用样例，因为api是和openai-python library兼容，所以可以直接用来替换openai api。

首先，安装openai-python

pip install --upgrade openai

其次，用下面这段代码进行测试

import openai
# to get proper authentication, make sure to use a valid key that's listed in
# the --api-keys flag. if no flag value is provided, the `api_key` will be ignored.
openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"
model = "vicuna-7b-v1.3"
prompt = "Once upon a time"
# create a completion
completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)
# create a chat completion
completion = openai.ChatCompletion.create(
  model=model,
  messages=[{"role": "user", "content": "Hello! What is your name?"}]
# print the completion
print(completion.choices[0].message.content)

如果成功，运行代码就会显示如下的效果：

Fastchat同时也提供了通过cURL测试api的方法，下面是三个例子，成功的话均会在terminal输出一段vicuna模型写的话

Chat Completions:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.3",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'

Text Completions:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.3",
    "prompt": "Once upon a time",
    "max_tokens": 41,
    "temperature": 0.5