A detailed pre-train process experiment

I am executing a pre-train process experiment to compare the outcome of the pre-trained model.

I tried to train a model with a few added tokens, but the outcome did not make sense. Here are the details: https://docs.google.com/document/d/1fracrFansBvoBM02ttDxlNcM2SJgUi-8dqzNgWSMxMg/edit?usp=sharing

Case 1 - 3 did not produce the same result as the Control Test.

What did I miss?