1.Dateset
My dataset contains 3000 pieces of json data, all of which are Chinese data sets. I use ChatGPT to process the document generation I gave.My original model is a Chinese fine-tuned model of llama3.
2.fine-tuning
My fine-tuning method uses Lora fine-tuning, and below are my fine-tuning parameters.
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = True,
random_state = 3407,
max_seq_length = max_seq_length,
use_rslora = False,
loftq_config = None,
)
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
tokenizer = tokenizer,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 60,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
output_dir = "outputs",
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
),
3.Model deployment
Below is my modefile document,i use this to deploy my model to ollama.Before that,I also use 16-bit quantify on my model, So my model has less than 5GB.
FROM jianllmq4_k_mv3_1.gguf
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
#设置tokens限制
PARAMETER num_ctx 4096
PARAMETER repeat_penalty 1.05
PARAMETER repeat_last_n 4096
TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""
#设置系统级别的提示词
SYSTEM """
现在你是xxxxx公司矿建领域的个人助理,你是一个xxxx领域的工程师,你要帮我解决我的专业性问题。
"""
I try to change the train epoch from 120 to 60,The model reduced duplicate responses, but it couldn’t answer my question(even the question is the same with the Q&A from the dataset).
I tried to write the modelfile file in more detail, and the structure of the model’s answer questions became clearer, but when I asked it questions in the dataset, it only gave vague answers and could not answer the key information.
Ryan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.