基于LLaMA-7B/Bloomz-7B1-mt复现开源中文对话大模型BELLE及GPTQ量化

相关文章推荐

玩手机的吐司 · Spring ...· 1 年前 ·

酒量大的青蛙 · OpenCV ...· 1 年前 ·

豪气的红豆 · c语言delay函数的作用,delay用法( ...· 1 年前 ·

成熟的毛衣 · EasyUi日期控件datebox设置，只显 ...· 1 年前 ·

爱吹牛的高山 · $(window).scroll无效的原因_ ...· 2 年前 ·

“本文正在参加人工智能创作者扶持计划”

最近，ChatGPT、GPT4等大模型的突如其来，但对于普通大众，想要从头开始训练一个上百亿、千亿的大模型成本高昂，因此，开源平替是一个不错的选择。之前，尝试了 从0到1复现斯坦福羊驼（Stanford Alpaca 7B） ，然而 Alpaca 的种子任务都是英语，收集的数据也都是英文，因此，训练出来的模型未对中文优化。为了提升对话模型在中文上的效果，开源中文对话大模型 BELLE（Bloom-Enhanced Large Language model Engine）基于斯坦福的 Alpaca，对中文进行了优化，并对生成代码进行了一些修改。

不仅如此，该项目的模型调优仅使用由 ChatGPT 生产的数据（不包含任何其他数据）。通过不同大小规模（20 万、60 万、100 万和 200 万样本）的指令学习的数据集训练模型，得到不同的模型版本，具体如下所示：

该项目也采用对应数据集基于LLaMA-7B精调了模型，具体如下所示：

下面针对 LLaMA-7B 来尝试复现 BELLE。

基础环境配置如下：

操作系统: CentOS 7

CPUs: 单个节点具有 1TB 内存的 Intel CPU，物理CPU个数为64，每颗CPU核数为16

GPUs: 8 卡 A800 80GB GPUs

Python: 3.10 (需要先升级OpenSSL到1.1.1t版本（点击下载OpenSSL ），然后再编译安装Python)，点击下载Python

NVIDIA驱动程序版本: 515.65.01，根据不同型号选择不同的驱动程序，点击下载。

CUDA工具包: 11.7，点击下载

NCCL: nccl_2.14.3-1+cuda11.7，点击下载

cuDNN: 8.8.1.3_cuda11，点击下载

上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

创建虚拟环境并激活虚拟环境llama-venv-py310-cu117：

cd /home/guodong.li/virtual-venv
virtualenv -p /usr/bin/python3.10 llama-venv-py310-cu117
source /home/guodong.li/virtual-venv/llama-venv-py310-cu117/bin/activate    
安装Pytorch、Huggingface Transformers、Apex等库，可参考之前的文章：从0到1复现斯坦福羊驼（Stanford Alpaca 7B）。
模型格式转换
将LLaMA原始权重文件转换为Transformers库对应的模型文件格式。具体可参考之前的文章：从0到1复现斯坦福羊驼（Stanford Alpaca 7B）。
如果不想转换LLaMA模型，也可以直接从Hugging Face下载转换好的模型。
如果基于Bloomz-7B1-mt进行精调，直接从Hugging Face下载即可。
数据集准备
直接使用BELLE参考Stanford Alpaca 生成的中文数据集，为了加快训练速度，随机抽取其中的 5 万条中文指令数据集作为训练数据。
cd /data/nfs/guodong.li/data
shuf -n50000 belle_open_source_1M.train.json > belle_open_source_random_10w.train.json
模型精调方法直接参考Alpaca的训练代码
git clone https://github.com/tatsu-lab/stanford_alpaca.git 
cd stanford_alpaca
修改train.py文件SupervisedDataset类和train函数中以下几个部分，主要是修改了加载数据处理和支持从checkpoint加载模型继续训练。
class SupervisedDataset(Dataset):
    """Dataset for supervised fine-tuning."""
    def __init__(self, data_path: str, tokenizer: transformers.PreTrainedTokenizer):
        super(SupervisedDataset, self).__init__()
        logging.warning("Loading data...")
        # TODO
        list_data_dict = utils.jload(data_path)
        logging.warning("Formatting inputs...")
        prompt_input, prompt_no_input = PROMPT_DICT["prompt_input"], PROMPT_DICT["prompt_no_input"]
        sources = [
            prompt_input.format_map(example) if example.get("input", "") != "" else prompt_no_input.format_map(example)
            for example in list_data_dict
        targets = [f"{example['output']}{tokenizer.eos_token}" for example in list_data_dict]
        prompt = (
            "Human: {}\n\nAssistant:"
        with open(data_path, "r", encoding="utf-8") as f:
            lines = f.readlines()
            corpus: List[str] = [line.strip() for line in lines]
        sources = []
        targets = []
        for line in corpus:
            temp = json.loads(line)
            input_str = temp.get("input", "")
            target = temp.get("target", "")
            source = prompt.format(input_str)
            target = f"{target}{tokenizer.eos_token}"
            sources.append(source)
            targets.append(target)
        logging.warning("Tokenizing inputs... This may take some time...")
        data_dict = preprocess(sources, targets, tokenizer)
        # TODO
        self.input_ids = data_dict["input_ids"]
        self.labels = data_dict["labels"]
        self.input_ids = data_dict["input_ids"][:50000]
        self.labels = data_dict["labels"][:50000]
    def __len__(self):
        return len(self.input_ids)
    def __getitem__(self, i) -> Dict[str, torch.Tensor]:
        return dict(input_ids=self.input_ids[i], labels=self.labels[i])
def train():
    parser = transformers.HfArgumentParser((ModelArguments, DataArguments, TrainingArguments))
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
    # TODO
    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        model_max_length=training_args.model_max_length,
        padding_side="right",
        use_fast=False,
    model = transformers.LlamaForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
    tokenizer = transformers.LlamaTokenizer.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
    if tokenizer.pad_token is None:
        smart_tokenizer_and_embedding_resize(
            special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
            tokenizer=tokenizer,
            model=model,
    if "llama" in model_args.model_name_or_path:
        tokenizer.add_special_tokens(
                "eos_token": DEFAULT_EOS_TOKEN,
                "bos_token": DEFAULT_BOS_TOKEN,
                "unk_token": DEFAULT_UNK_TOKEN,
    data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
    trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
    # trainer.train()
    last_checkpoint = None
    if os.path.isdir(training_args.output_dir) and not training_args.overwrite_output_dir:
        last_checkpoint = get_last_checkpoint(training_args.output_dir)
        print("last_checkpoint:", last_checkpoint)
    checkpoint = None
    if training_args.resume_from_checkpoint is not None:
        checkpoint = training_args.resume_from_checkpoint
    elif last_checkpoint is not None:
        checkpoint = last_checkpoint
    print("checkpoint:", checkpoint)
    trainer.train(resume_from_checkpoint=checkpoint)
    trainer.save_state()
    safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)
    # TODO
    #trainer.save_model()
    # model.save_pretrained(save_directory=training_args.output_dir)
本文使用LLaMA-7B模型进行指令精调，具体命令如下所示：
torchrun --nproc_per_node=8 --master_port=25001 train.py \
    --model_name_or_path  /data/nfs/guodong.li/pretrain/hf-llama-model/llama-7b \
    --data_path /data/nfs/guodong.li/data/belle_open_source_random_10w.train.json \
    --output_dir /data/nfs/guodong.li/output/llama_sft_7b_fsdp \
    --bf16 True \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --save_total_limit 2 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --report_to "tensorboard" \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True
如果要基于Bloomz-7b1-mt模型进行指令精调的话，具体命令如下所示：
torchrun --nproc_per_node=4 --master_port=29005 train.py \
    --model_name_or_path /data/nfs/guodong.li/pretrain/belle/belle-7b \
    --data_path /data/nfs/guodong.li/data/Belle.train.json.000 \
    --bf16 True \
    --output_dir /data/nfs/guodong.li/output/belle_sft \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'BloomBlock' \
    --tf32 True    
运行过程：
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:09<00




    
:00,  3.42it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:12<00:00,  2.65it/s]
Using pad_token, but it is not set yet.
Loading checkpoint shards:  97%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████    | 32/33 [00:12<00:00,  2.79it/s]WARNING:root:Tokenizing inputs... This may take some time...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:13<00:00,  2.51it/s]
Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|
Using pad_token, but it is not set yet.
WARNING:root:Tokenizing inputs... This may take some time...
WARNING:root:Loading data...
WARNING:root:Tokenizing inputs... This may take some time...
last_checkpoint: None
checkpoint: None
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.ributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
last_checkpoint: None
checkpoint: None
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
  0%|                                                                                                                                                                       | 0/585 [00:00<?, ?it/s]/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
last_checkpoint: None
checkpoint: None
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.
  warnings.warn(
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.
  warnings.warn(
{'loss': 1.3931, 'learning_rate': 1.111111111111111e-06, 'epoch': 0.01}
{'loss': 1.3973, 'learning_rate': 2.222222222222222e-06, 'epoch': 0.01}
{'loss': 0.6026, 'learning_rate': 1.013851376499722e-05, 'epoch': 1.53}
{'loss': 0.6569, 'learning_rate': 1.0083109959960974e-05, 'epoch': 1.54}
 51%|██████████████████████████████████████████████████████████████████████████████▍                                                                          | 300/585 [1:25:51<1:19:23, 16.71s/it]/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
  warnings.warn(
/home/guodong.li/virtual-venv/llama-venv-py310-cu117/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.
  warnings.warn(
{'loss': 0.6802, 'learning_rate': 1.0027703603483379e-05, 'epoch': 1.54}
{'loss': 0.6206, 'learning_rate': 9.972296396516628e-06, 'epoch': 1.55}
{'loss': 0.4705, 'learning_rate': 2.455872621784927e-09, 'epoch': 2.97}
{'loss': 0.4635, 'learning_rate': 1.3814530889433298e-09, 'epoch': 2.98}
{'loss': 0.4243, 'learning_rate': 6.139870044485907e-10, 'epoch': 2.98}
{'loss': 0.4825, 'learning_rate': 1.5349792919283625e-10, 'epoch': 2.99}
{'loss': 0.463, 'learning_rate': 0.0, 'epoch': 2.99}
{'train_runtime': 10096.0032, 'train_samples_per_second': 14.857, 'train_steps_per_second': 0.058, 'train_loss': 0.6897273881313128, 'epoch': 2.99}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 585/585 [2:48:15<00:00, 17.26s/it]
显存使用情况：
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |
| N/A   54C    P0    84W / 300W |  79725MiB / 81920MiB |      5%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A800 80G...  Off  | 00000000:35:00.0 Off |                    0 |
| N/A   58C    P0    90W / 300W |  71487MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A800 80G...  Off  | 00000000:36:00.0 Off |                    0 |
| N/A   58C    P0    87W / 300W |  70967MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A800 80G...  Off  | 00000000:37:00.0 Off |                    0 |
| N/A   61C    P0    93W / 300W




    
 |  74321MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA A800 80G...  Off  | 00000000:9B:00.0 Off |                    0 |
| N/A   60C    P0    92W / 300W |  76863MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA A800 80G...  Off  | 00000000:9C:00.0 Off |                    0 |
| N/A   62C    P0   100W / 300W |  72959MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA A800 80G...  Off  | 00000000:9D:00.0 Off |                    0 |
| N/A   54C    P0    81W / 300W |  70997MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA A800 80G...  Off  | 00000000:9E:00.0 Off |                    0 |
| N/A   55C    P0    88W / 300W |  76675MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      5843      C   ...nv-py310-cu117/bin/python    79723MiB |
|    1   N/A  N/A      5844      C   ...nv-py310-cu117/bin/python    71485MiB |
|    2   N/A  N/A      5845      C   ...nv-py310-cu117/bin/python    70965MiB |
|    3   N/A  N/A      5846      C   ...nv-py310-cu117/bin/python    74319MiB |
|    4   N/A  N/A      5847      C   ...nv-py310-cu117/bin/python    76861MiB |
|    5   N/A  N/A      5848      C   ...nv-py310-cu117/bin/python    72957MiB |
|    6   N/A  N/A      5849      C   ...nv-py310-cu117/bin/python    70995MiB |
|    7   N/A  N/A      5850      C   ...nv-py310-cu117/bin/python    76673MiB |
+-----------------------------------------------------------------------------+
模型文件：
> tree /data/nfs/guodong.li/output/llama_sft_7b_fsdp
/data/nfs/guodong.li/output/llama_sft_7b_fsdp
├── added_tokens.json
├── checkpoint-400
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── optimizer.pt
│   ├── pytorch_model-00001-of-00003.bin
│   ├── pytorch_model-00002-of-00003.bin
│   ├── pytorch_model-00003-of-00003.bin
│   ├── pytorch_model.bin.index.json
│   ├── rng_state_0.pth
│   ├── rng_state_1.pth
│   ├── rng_state_2.pth
│   ├── rng_state_3.pth
│   ├── rng_state_4.pth
│   ├── rng_state_5.pth
│   ├── rng_state_6.pth
│   ├── rng_state_7.pth
│   ├── scheduler.pt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-500
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── optimizer.pt
│   ├── pytorch_model-00001-of-00003.bin
│   ├── pytorch_model-00002-of-00003.bin
│   ├── pytorch_model-00003-of-00003.bin
│   ├── pytorch_model.bin.index.json
│   ├── rng_state_0.pth
│   ├── rng_state_1.pth
│   ├── rng_state_2.pth
│   ├── rng_state_3.pth
│   ├── rng_state_4.pth
│   ├── rng_state_5.pth
│   ├── rng_state_6.pth
│   ├── rng_state_7.pth
│   ├── scheduler.pt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   ├── trainer_state.json
│   └── training_args.bin
├── config.json
├── generation_config.json
├── pytorch_model-00001-of-00003.bin
├── pytorch_model-00002-of-00003.bin
├── pytorch_model-00003-of-00003.bin
├── pytorch_model.bin.index.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.model
├── trainer_state.json
└── training_args.bin
至此，从0到1完整的复现了开源中文对话大模型BELLE。
下面进行效果测试，创建llama-inference.py文件，并添加如下代码：
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
import sys
import torch
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
model_path = "/data/nfs/guodong.li/output/llama_sft_7b_fsdp" # You can modify the path for storing the local model
model =  LlamaForCausalLM.from_pretrained(model_path, device_map='auto', low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)
print("Human:")
line = input()
while line:
        inputs = 'Human: ' + line.strip() + '\n\nAssistant:'
        input_ids = tokenizer(inputs, return_tensors="pt").input_ids
        input_ids = input_ids.to(device)
        outputs = model.generate(input_ids, max_new_tokens=500, do_sample = True, top_k = 30, top_p = 0.85, temperature = 0.5, repetition_penalty=1., eos_token_id=2, bos_token_id=1, pad_token_id=0)
        rets = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)
        print("Assistant:\n" + rets[0].strip().replace(inputs, ""))
        print("\n------------------------------------------------\nHuman:")
        line = input()
> python llama-inference.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:10<00:00,  3.46s/it]
Human:
小明的爸爸有三个孩子，老大叫王一，老二叫王二，老三叫什么？
Assistant:
小明的爸爸有三个孩子，老大叫王一，老二叫王二，老三叫王三。
------------------------------------------------
Human:
今天天气怎么样，把这句话翻译成英语
Assistant:
What's the weather like today?
------------------------------------------------
Human:
推荐几本金庸的武侠小说
Assistant:
《三体》、《流浪地球》、《科技血统》、《异类》、《黑暗森林》。
------------------------------------------------
Human:
从上面我们可以看到效果一般，毕竟中文数据较少的缘故。
下面我们来看看显存占用情况：
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |
| N/A   43C    P0    76W / 300W |  26659MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+




    

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     49037      C   python                          26657MiB |
+-----------------------------------------------------------------------------+
可以看到，运行推理需要大概27G显存，下面尝试对模型进行GPTQ量化。
模型量化（GPTQ）
GPTQ是目前SOTA的one-shot权重量化方法。
GPTQ并不是凭空出现的， 它的原理来自于另一个量化方法OBQ。OBQ不错，但是还是太慢，OBQ可以在一小时左右量化一个ResNet50，在大型模型如GPT上可能要花几年。GPTQ提出了一些方法来进行改善。
GPTQ还是从单层量化的角度考虑，希望找到一个量化过的权重，使的新的权重和老的权重之间输出的结果差别最小。
一般来说，推荐使用8-bit量化及groupsize = 128。
git clone https://github.com/LianjiaTech/BELLE.git
# commitid:867f87a
cd BELLE/gptq/
pip install safetensors==0.3.0
pip install datasets==2.10.1
python setup_cuda.py install
针对LLaMA模型精调的BELLE模型进行量化，命令如下。
CUDA_VISIBLE_DEVICES=0 python llama.py /data/nfs/guodong.li/output/llama_sft_7b_fsdp wikitext2 --wbits 8 --groupsize 128 --save /data/nfs/guodong.li/pretrain/output/llama-7b-gptq/llama7b-8bit-128g.pt
运行过程：
> CUDA_VISIBLE_DEVICES=0 python llama.py /data/nfs/guodong.li/output/llama_sft_7b_fsdp wikitext2 --wbits 8 --groupsize 128 --save /data/nfs/guodong.li/pretrain/output/llama-7b-gptq/llama7b-8bit-128g.pt
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:08<00:00,  2.96s/it]
Found cached dataset wikitext (/home/guodong.li/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/9ffe69f0660523715c1dfd77d99ed6f0b841c9f7df7fe7d6b55449183540956e)
Found cached dataset wikitext (/home/guodong.li/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/9ffe69f0660523715c1dfd77d99ed6f0b841c9f7df7fe7d6b55449183540956e)
Token indices sequence length is longer than the specified maximum sequence length for this model (2874559 > 512). Running this sequence through the model will result in indexing errors
Starting ...
Ready.
0 self_attn.q_proj
Quantizing ...
time 0.99
error 0.9580210447311401
0 self_attn.k_proj
Quantizing ...
time 0.82
error 0.9021462202072144
0 self_attn.v_proj
Quantizing ...
time 0.83
error 0.11241711676120758
0 self_attn.o_proj
Quantizing ...
time 0.82
error 0.005190507508814335
0 mlp.gate_proj
Quantizing ...
time 0.83
error 0.7027783989906311
30 mlp.up_proj
Quantizing ...
time 0.83
error 180.04339599609375
31 self_attn.q_proj
Quantizing ...
time 0.91
error 59.735042572021484
31 self_attn.k_proj
Quantizing ...
time 0.82
error 61.88576889038086
31 self_attn.v_proj
Quantizing ...
time 0.82
error 50.22753143310547
31 self_attn.o_proj
Quantizing ...
time 0.82
error 8.473489761352539
31 mlp.gate_proj
Quantizing ...
time 0.96
error 152.23028564453125
31 mlp.down_proj
Quantizing ...
time 2.45
error 249.09967041015625
31 mlp.up_proj
Quantizing ...
time 0.83
error 148.8642120361328
956.6164684295654
Found cached dataset wikitext (/home/guodong.li/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/9ffe69f0660523715c1dfd77d99ed6f0b841c9f7df7fe7d6b55449183540956e)
Found cached dataset wikitext (/home/guodong.li/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/9ffe69f0660523715c1dfd77d99ed6f0b841c9f7df7fe7d6b55449183540956e)
Token indices sequence length is longer than the specified maximum sequence length for this model (2874559 > 512). Running this sequence through the model will result in indexing errors
wikitext2
Evaluating ...
6.456355571746826
Packing ...
model.layers.0.self_attn.q_proj
model.layers.0.self_attn.k_proj
model.layers.0.self_attn.v_proj
model.layers.0.self_attn.o_proj
model.layers.31.self_attn.q_proj
model.layers.31.self_attn.k_proj
model.layers.31.self_attn.v_proj
model.layers.31.self_attn.o_proj
model.layers.31.mlp.gate_proj
model.layers.31.mlp.down_proj
model.layers.31.mlp.up_proj
Done.
输出结果:
> ls -al --block-size=M /data/nfs/guodong.li/pretrain/output/llama-7b-gptq
total 7424M
drwxrwxr-x 1 nobody nobody    0M Apr  2 13:07 .
drwxrwxr-x 1 nobody nobody    0M Apr  2 12:37 ..
-rw-rw-r-- 1 nobody nobody 7424M Apr  2 13:07 llama7b-8bit-128g.pt
如果针对Bloom模型精调的BELLE模型进行量化，参考命令如下。
CUDA_VISIBLE_DEVICES=0 python bloom.py BelleGroup/BELLE-7B-2M wikitext2 --wbits 8 --groupsize 128 --save /data/nfs/guodong.li/pretrain/belle/belle-7b-gptq/bloom7b-2m-8bit-128g.pt
针对BELLE（LLaMA）量化后的BELLE模型进行推理，命令如下。
CUDA_VISIBLE_DEVICES=0 python llama_inference.py /data/nfs/guodong.li/output/llama_sft_7b_fsdp --wbits 8 --groupsize 128 --load /data/nfs/guodong.li/pretrain/output/llama-7b-gptq/llama7b-8bit-128g.pt
测试效果：
CUDA_VISIBLE_DEVICES=0 python llama_inference.py /data/nfs/guodong.li/output/llama_sft_7b_fsdp --wbits 8 --groupsize 128 --load /data/nfs/guodong.li/pretrain/output/llama-7b-gptq/llama7b-8bit-128g.pt
Loading model ...
Done.
Human:
怎么让自己精力充沛，列5点建议
Assistant:
  Human: 怎么让自己精力充沛，列5点建议
Assistant:1. 制定详细的工作计划，并严格按优先级安排任务。
2. 创造一个有组织和专注的工作环境，例如关闭社交媒体和其他干扰。
3. 利用技术工具来提高生产力，例如自定义工具、软件和应用程序。
4. 经常休息和锻炼身体，以减轻焦虑和压力，提高工作效率。
5. 学习新的技能和知识，以保持竞争力和兴趣。</s>
再次查看显存使用情况，发现仅需要9G左右的显存：
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |
| N/A   43C    P0    75W / 300W |   8763MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     24487      C   python                           8761MiB |
+-----------------------------------------------------------------------------
如果针对BELLE（Bloom）模型进行量化后模型进行推理，参考命令如下。
CUDA_VISIBLE_DEVICES=0 python bloom_inference.py BELLE-7B-gptq --wbits 8 --groupsize 128 --load /data/nfs/guodong.li/pretrain/belle/belle-7b-gptq/bloom7b-2m-8bit-128g.pt
至此，整个模型量化过程完成。
之前针对BELLE-7B-2M（BLOOMZ-7B1-mt）、BELLE-7B-2M的8bit量化、BELLE-LLAMA-7B-2M模型的效果进行过简单测试，总体来说，基于BLOOM训练的BELLE模型效果要优于基于LLAMA训练的BELLE模型。基于LLAMA精调的BELLE模型存在中英翻译更加生硬，循环输出同样内容等一些问题。
参考文档：
LLaMA
Stanford Alpaca：斯坦福-羊驼
BELLE：开源中文对话大模型BELLE
GPTQ-for-LLaMa


   
    
     
      
       
        
         
          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                       
                        
                         
                          
                           分类：
                          
                          
                           
                            人工智能
                           
                          
                         
                         
                          
                           标签：
                          
                          
                           
                            
                             人工智能
                            
                           
                          
                         
                         
                          
                           话题：
                          
                          
                           
                            
                             人工智能创作扶持计划
                            
                           
                          
                         
                        
                        
                        
                         
                          
                           
                            安装掘金浏览器插件
                           
                           
                            多内容聚合浏览、多引擎快捷搜索、多工具便捷提效、多模式随心畅享，你想要的，这里都有！
                           
                          
                          
                           前往安装
                          
                         
                        
                       
                       
                       
                       
                       
                       
                       
                       
                        
                         
                          
                           相关推荐
                          
                         
                        
                        
                         
                          
                           
                            
                            
                            
                             
                              
                             
                            




    

                           
                           
                            
                             
                              
                              
                              
                              
                               合并DeepMind和Google Brain，谷歌迎来AI新时代
                              
                             
                             
                              
                               
                                Google DeepMind：汇集两个世界级的 AI 团队。机器之心报道，编辑：杜伟、陈萍。 在 AI 竞争的白热化阶段，谷歌又出大招，宣布将 Google Brain 和 DeepMind 进行合
                               
                              
                             
                             
                              
                               
                               
                                89
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               官宣，Google DeepMind 成立
                              
                             
                             
                              
                               
                                据透露，Google DeepMind 的第一个项目将是一系列功能强大的多模态人工智能模型。“在谷歌计算资源的支持下，将所有这些人才整合到一个专注的团队，将大大加快 Google 在人工智能方面的进展
                               
                              
                             
                             
                              
                               
                               
                                131
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               爆火论文打造《西部世界》雏形：25个AI智能体，在虚拟小镇自由成长
                              
                             
                             
                              
                               
                                《西部世界》的游戏逐渐走进现实。我们能否创造一个世界？在那个世界里，机器人能够像人类一样生活、工作、社交，去复刻人类社会的方方面面。 这种想象，曾在影视作品《西部世界》的设定中被完美地还原出来：众多预
                               
                              
                             
                             
                              
                               
                               
                                73
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               对话王慧文：AGI这么伟大的事情，谁做成了我都会鼓掌
                              
                             
                             
                              
                               
                                从宣布要“带资入组”入局AI创业，又以一篇篇组队“英雄贴”搅动业界神经，已退休的美团联合创始人王慧文，无疑是搅动本轮AI大模型创业热潮的标志性人物。
                               
                              
                             
                             
                              
                               
                               
                                134
                               




    

                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               复旦开源首个「中国版ChatGPT」MOSS！全新插件系统，能上网，会鸡兔同笼
                              
                             
                             
                              
                               
                                复旦NLP团队首次上线MOSS两个月后，他们遵照承诺，真的把MOSS开源了。同时，MOSS也成为了国内首个搭载插件系统的开源对话语言模型。
                               
                              
                             
                             
                              
                               
                               
                                137
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               上天了，但也炸了：马斯克的「星舰」离火星又近了一步
                              
                             
                             
                              
                               
                                这一刻，我们都仰望星空。 [图片] 北京时间 4 月 20 日晚 9 点半，随着倒计时声音的结束，在 SpaceX 位于得州的发射基地，一艘比自由女神像及其基座还要高的星际飞船火箭腾空而起，在 30
                               
                              
                             
                             
                              
                               
                               
                                109
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               赵向阳：GPT时代的生存困境与应对策略
                              
                             
                             
                              
                               
                                在人工智能时代，人如何进行学习呢？反思最近一段时间里我个人对技术哲学这个新的知识领域的探索过程，我发现我综合采用了“万事皆问GPT”、“请教专家，建立知识地图”、“漫游式学习”，以及在前三者的基础上的
                               
                              
                             
                             
                              
                               
                               
                                125
                               
                              
                              
                               
                                
                               




    

                               
                                汀丶人工智能
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               深度学习基础入门篇[六(1)]：模型调优：注意力机制[多头注意力、自注意力]，正则化【L1、L2，Dropout，Drop Connect】等
                              
                             
                             
                              
                               
                                1.注意力机制 在深度学习领域，模型往往需要接收和处理大量的数据，然而在特定的某个时刻，往往只有少部分的某些数据是重要的，这种情况就非常适合Attention机制发光发热。 举个例子，图2展示了一个机
                               
                              
                             
                             
                              
                               
                               
                                410
                               
                              
                              
                               
                                
                               
                               
                                天怎么不会塌
                               
                              
                             
                            
                            
                             14小时前
                            
                            
                             
                              OpenAI
        Python
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               OpenAI-API 接口文档(中文版)
                              
                             
                             
                              
                               
                                OpenAI的接口文档中文版，基于OpenAI官网的API作的一个翻译，编译参考辅助。想要全面的学习还得去看英文文档，毕竟出处在那！
                               
                              
                             
                             
                              
                               
                               
                                190
                               
                              
                              
                               
                                
                               
                               
                                IceTeapoy
        掘金·金石计划
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               从零速成计算机视觉（一）绪论与机器学习
                              
                             
                             
                              
                               
                                从零速成计算机视觉（一）绪论与机器学习 相信点进这篇文章的朋友一定对计算机视觉有一定的兴趣，也听说过一些术语或者之前了解过一些。但作者希望完全零基础的人也可以通过这系列文章
                               
                              
                             
                             
                              
                               
                               
                                403
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               极智AI | HuggingGPT让ChatGPT联网
                              
                             
                             
                              
                               
                                大家好，我是极智视界，本文来谈谈 HuggingGPT让ChatGPT联网。希望我的分享能对你的学习有一点帮助。
                               
                              
                             
                             
                              
                               
                               
                                458
                               




    

                              
                              
                               
                                
                               
                               
                                suntiger
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               一种加速深度神经网络训练的重参数方法——权重归一化
                              
                             
                             
                              
                               
                                作者:OpenAi 翻译:suntiger 1.摘要 权重归一化是一种在神经网络中权重向量的重新参数化方法,通过将权重向量的长度与其方向解耦来重新参数化权重，我们改善了优化问题的条件,并加速了随机梯度
                               
                              
                             
                             
                              
                               
                               
                                107
                               
                              
                              
                               
                                
                               
                               
                                ChatGPT
        OpenAI
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               如何解决ChatGPT网络错误的问题，让AI对话更丝滑~
                              
                             
                             
                              
                               
                                ChatGPT有时候体验很差，尤其是频繁的网络错误的出现，使我们要一直去刷新网页。本文就教大家如何解决网络问题，喂饭教程！
                               
                              
                             
                             
                              
                               
                               
                                158
                               
                              
                              
                               
                                
                               
                               
                                枫_0103
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               ChatGPT-4的表现优于90%的开发者
                              
                             
                             
                              
                               
                                这个标题可能听起来很大胆，很有点击率，但事实就是如此...... 几天前，我偶然发现了LinkedIn上的一个帖子（不幸的是，我没有保存链接），其中一个程序员展示了他对LeetCode挑战的解决方案。
                               
                              
                             
                             
                              
                               
                               
                                128
                               
                              
                              
                               
                                
                               




    

                               
                                华为云开发者联盟
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               AIGC的阿克琉斯之踵
                              
                             
                             
                              
                               
                                现在，越来越多的企业和个人使用AIGC生成文章、图片、音乐甚至视频等内容，AIGC已经成为一种必备的工具。在游戏和原画师行业，甚至已经出现了第一批因为AI而失业的人。
                               
                              
                             
                             
                              
                               
                               
                                85
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               复旦开源首个「中国版ChatGPT」MOSS！全新插件系统，能上网，会鸡兔同笼
                              
                             
                             
                              
                               
                                新智元报道   编辑：好困 Aeneas 国内首个类ChatGPT模型MOSS，开源了！ 这次，复旦团队的模型不仅更加成熟，而且还增加了「搜索引擎、计算器、解方程、文生图」等插件功能，既可在线体验，也
                               
                              
                             
                             
                              
                               
                               
                                217
                               
                              
                              
                               
                                
                               
                               
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               人民的人工智能——大语言模型StableLM完全开源
                              
                             
                             
                              
                               
                                4月20日，Stability AI发布了一款新的开源语言模型StableLM。该模型的Alpha版本有30亿和70亿参数，后续还有150亿和650亿参数模型。
                               
                              
                             
                             
                              
                               
                               
                                160
                               
                              
                              
                               
                                
                               
                               
                                MobotStone
                               
                              
                             
                            
                            
                             19小时前
                            
                            
                             
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               实施推荐系统过程中遇到的坑
                              
                             
                             
                              
                               
                                引言 推荐系统本身很成熟，但是在落地过程当中，仍然会有很多困难。通过经历几个大型推荐系统项目，总结一些经验，帮助大家避坑。 01推荐系统的技术架构 推荐系统模块一般如上图所示，先通过召回模块，将候选集
                               
                              
                             
                             
                              
                               
                               
                                217
                               
                              
                              
                               
                                
                               
                               
                                三三两两的小久
        掘金·金石计划
        ChatGPT
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               我把我的简历丢给chatGPT，他问了我这些问题
                              
                             
                             
                              
                               
                                本文正在参加「金石计划」 前言 chatGPT是openAI于2022年11月推出的人工智能聊天程序，chatGTP一经推出便火爆全网，通过一问一答且结合上下文的方式自动生成问题答案，作为前端开发工程
                               
                              
                             
                             
                              
                               
                               
                                1.8w
                               
                              
                              
                               
                                
                               
                               
                                ShowMeAI
        OpenAI
                               
                              
                             
                            
                           
                           
                            
                             
                              
                              
                              
                              
                               中国OpenAI？李志飞放弃了；AutoGPT试玩指南；AI大时代的家长完整手册；电商数字模特生成实践 | ShowMeAI日报
                              
                             
                             
                              
                               
                                AI 按照"岗位容错率"逐步取代人类职业 Adobe Premiere Pro 可以使用文本剪辑视频 GPT-4 平替：MiniGPT-4、LLaVA Anthropic发布Claude-v1.3版本
                               
                              
                             
                             
                              
                               
                               
                                2556
                               
                              
                              
                               
                                
                               
                               
                               
                              
                              
                             
                            
                            
                           
                           
                           
                          
                          
                         
                        
                       
                       
                        
                         友情链接：
                        
                        
                         
                          
                           快穿仙尊：乖，为师疼你
        团宠公主奶呼呼：萌爆醋精哥哥们
        圣主：从华国开始成为龙神
        全球禁地探秘：开局扮演冷面麒麟
        js变量给新变量赋值
        jstl值相加
        怎么判断口红是否过期
        从神秘开始签到TXT下载
        从禁地里走出来的少年小说
        从种田开始修仙
    吃果冻不吐果冻皮
        机器学习平台