posts/2024/importing-yi9b-to-ollama/ #8

2024-04-20T15:20:42Z

giscus[bot]
bot Apr 20, 2024

posts/2024/importing-yi9b-to-ollama/

The log of importing Yi-9B LLM model to Ollama library.

https://shinyzhu.com/posts/2024/importing-yi9b-to-ollama/

bluryar · 2024-04-20T15:20:44Z

bluryar
Apr 20, 2024 — with giscus

我是LLM领域的菜鸟，也从未接错过机器学习、深度学习这些内容。

最近，我开始尝试了解LLM以及如何微调他们。过程中，我得知这些大模型通常会以 xxx-Chat 或 xxx-Instruct 命名来表示大模型接受了指令监督微调（SFT），它们被称为聊天模型，而那些不带后缀或者以 xxx-Base 命名的模型则是基础模型，它们只会补全内容而不会回答你。

通过搜索，我得知需要对它进行SFT微调，我已经完成了这方面的工作，只不过模型没上传到HuggingFace等公共平台上，因为我发现的使用LLaMA-Factory进行LoRA微调时，学习率设置太大导致模型训练效果没有收敛。

效果如何？坦率的讲，效果很差，但可以像和ChatGPT对话那样进行交流了，不过有些时候它还是表现的像个Base模型那样自说自话。我注意到你最新更新的版本已经可以完成对话了，不知道你后面是否还进行了其他工作？期待您的指点。

下面是我的脚本：

# SFT微调, 让模型可以进行Chat任务
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train True \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template yi \
    --dataset_dir data \
    --dataset belle_2m \
    --cutoff_len 1024 \
    --learning_rate 0.0002 \
    --num_train_epochs 3.0 \
    --max_samples 20000 \
    --per_device_train_batch_size 6 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 50 \
    --neftune_noise_alpha 5 \
    --optim adamw_torch \
    --packing True \
    --report_to none \
    --output_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora \
    --fp16 True \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.1 \
    --lora_target q_proj,v_proj \
    --plot_loss True 


# 命令行试用模型, 用于测试模型是否可以正常工作
CUDA_VISIBLE_DEVICES=0 python  src/cli_demo.py \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path  saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --template yi \
    --quantization_bit 4 \
    --finetuning_type lora

# 对模型进行评分, 执行失败, A10 的显存不足
CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path  saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --template yi \
    --quantization_bit 4 \
    --finetuning_type lora \
    --task mmlu \
    --split test \
    --lang zh \
    --n_shot 5 \
    --batch_size 4

# 对模型进行评分, 执行成功, 效果如下
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path  saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template yi \
    --dataset_dir data \
    --dataset alpaca_gpt4_zh \
    --cutoff_len 1024 \
    --max_samples 2000 \
    --per_device_eval_batch_size 16 \
    --predict_with_generate True \
    --max_new_tokens 128 \
    --top_p 0.7 \
    --temperature 0.95 \
    --output_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --do_predict True

***** predict metrics *****
  predict_bleu-4             =    12.0712
  predict_rouge-1            =     34.153
  predict_rouge-2            =     12.641
  predict_rouge-l            =    23.7601
  predict_runtime            = 0:38:24.18
  predict_samples_per_second =      0.868
  predict_steps_per_second   =      0.054



# 合并模型
# DO NOT use quantized model or quantization_bit when merging lora weights
CUDA_VISIBLE_DEVICES=0 python src/export_model.py \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --template yi \
    --finetuning_type lora \
    --export_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora/models \
    --export_size 4 \
    --export_legacy_format False


# 对模型进行GPTQ 4bit量化, 执行失败, 显存不足
#!/bin/bash
CUDA_VISIBLE_DEVICES=0 python src/export_model.py \
    --model_name_or_path saves/Yi-9B/lora/yi-9b-200k-chat-lora/models \
    --template yi \
    --export_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora-int4/models \
    --export_quantization_bit 4 \
    --export_quantization_dataset data/c4_demo.json \
    --export_size 1 \
    --export_legacy_format False

1 reply

shinyzhu Apr 22, 2024
Maintainer

Great work you've done!

我注意到你最新更新的版本已经可以完成对话了，不知道你后面是否还进行了其他工作？

这是一个有意思的问题。我并没有对 Yi-9B 做任何改动和微调（我还在学如何微调），只是按照 Ollama 的导入模型流程进行了操作，在创建模型的时候，使用了跟其他 Yi 模型相同的模板（TEMPLATE）和 stop 参数：

FROM quantized.bin
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
SYSTEM """
You are a helpful and powerful assistant. Respond to user's input carefully.
"""

我觉得可能是 Ollama 的能力让它能够“对话”，不过还没有深入研究。希望有了解的伙伴可以分享一下：）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

posts/2024/importing-yi9b-to-ollama/ #8

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

posts/2024/importing-yi9b-to-ollama/ #8

Uh oh!

giscus[bot] bot Apr 20, 2024

posts/2024/importing-yi9b-to-ollama/

Replies: 1 comment · 1 reply

Uh oh!

bluryar Apr 20, 2024 — with giscus

Uh oh!

shinyzhu Apr 22, 2024 Maintainer

giscus[bot]
bot Apr 20, 2024

Replies: 1 comment 1 reply

bluryar
Apr 20, 2024 — with giscus

shinyzhu Apr 22, 2024
Maintainer