Update README.md

2024-07-26 11:29:09 +08:00 · 2024-07-26 11:29:09 +08:00 · f97beca23a
parent ceba96f9ed
commit f97beca23a
1 changed files with 4 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -47,7 +47,7 @@ Choose your path:
 ## Features

 - **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
- **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO (The details of TRL PPO can refer to [this blog](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html).), DPO, KTO, ORPO, etc.
+- **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.

 - **Scalable resources**: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
 - **Advanced algorithms**: GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ, PiSSA and Agent tuning.
@ -201,6 +201,9 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t
 | ORPO Training          | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
 | SimPO Training         | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |

+> [!TIP]
+> The implementation details of PPO can be found in [this blog](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html).
+
 ## Provided Datasets

 <details><summary>Pre-training datasets</summary>