I am executing a pre-train process experiment to compare the outcome of the pre-trained model.
I tried to train a model with a few added tokens, but the outcome did not make sense. Here are the details: https://docs.google.com/document/d/1fracrFansBvoBM02ttDxlNcM2SJgUi-8dqzNgWSMxMg/edit?usp=sharing
Case 1 - 3 did not produce the same result as the Control Test.
What did I miss?