some question

Hello, author! I want to ask you why you should use the decoupling fine-tuning training method to carry out the experiment. Can't you train 24 epochs end-to-end? Or is the effect of end-to-end training worse? In addition, can fine-tuning training be regarded as a kind of tick?