LlamaFactory/src/llamafactory/train/ppo/trainer.py at 0b0e27c2f1479c7fcdaf987e7f5c2a85cd9ed3a8

mirror of https://github.com/hiyouga/LlamaFactory.git synced 2026-03-23 02:33:24 +08:00

Files

hiyouga 0b0e27c2f1 fix #4609

unwrap_model_for_generation(reward_model) is necessary for zero3 training


Former-commit-id: c8d5b21700577cae8d6ca03359bcf1762c8b7cb8

2024-07-03 19:45:51 +08:00

View Raw