Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
torch==1.6.0
cudatoolkit==10.0.103
cudnn==7.6.5
sentence-transformers==0.3.9
transformers==3.4.0
tensorboardX==2.1
pandas==1.1.5
sentencepiece==0.1.85
matplotlib==3.4.1
apex==0.1.0
To install apex, run:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./
- Download pre-trained language model (e.g. bert-base-uncased) to folder ./bert-base-uncasedfrom HuggingFace's Library
- Download STS datasets to ./datafolder by runningcd data && bash get_transfer_data.bash. The script is modified from SentEval toolkit
- Run the scripts in the folder ./scriptsto reproduce our experiments. For example, run the following script to train unsupervised consert-base:bash scripts/unsup-consert-base.sh 
Update: Recently we update the support for RoBERTa model. Download the RoBERTa model to folder e.g. ./roberta-base and use scripts in ./scripts/roberta instead.
- Download the pre-trained language model (e.g. chinese-roberta-wwm-ext) to folder ./chinese-roberta-wwm-extfrom HuggingFace's Library
- Download the chinese STS datasets to ./datafolder by runningcd data && bash get_chinese_sts_data.bash
- Run the scripts in the folder ./scripts/chineseto train models for chinese STS tasks. For example, run the following script to train base model foratec_cckstask:Thebash scripts/chinese/unsup-consert-base-atec_ccks.sh --chinese_datasetoption inmain.pyis used to select which chinese STS dataset to use.
| ID | Model | STS12 | STS13 | STS14 | STS15 | STS16 | STSb | SICK-R | Avg. | 
|---|---|---|---|---|---|---|---|---|---|
| - | bert-base-uncased (baseline) | 35.20 | 59.53 | 49.37 | 63.39 | 62.73 | 48.18 | 58.60 | 53.86 | 
| - | bert-large-uncased (baseline) | 33.06 | 57.64 | 47.95 | 55.83 | 62.42 | 49.66 | 53.87 | 51.49 | 
| 1 | unsup-consert-base [Google Drive] [百度云q571] | 64.64 | 78.49 | 69.07 | 79.72 | 75.95 | 73.97 | 67.31 | 72.74 | 
| 2 | unsup-consert-large [Google Drive] [百度云9fm1] | 70.28 | 83.23 | 73.80 | 82.73 | 77.14 | 77.74 | 70.19 | 76.45 | 
| 3 | sup-sbert-base (re-impl.) [Google Drive] [百度云msqy] | 69.93 | 76.00 | 72.15 | 78.59 | 73.53 | 76.10 | 73.01 | 74.19 | 
| 4 | sup-sbert-large (re-impl.) [Google Drive] [百度云0oir] | 73.06 | 77.77 | 75.21 | 81.63 | 77.30 | 79.74 | 74.75 | 77.07 | 
| 5 | sup-consert-joint-base [Google Drive] [百度云jks5] | 70.92 | 79.98 | 74.88 | 81.76 | 76.46 | 78.99 | 78.15 | 77.31 | 
| 6 | sup-consert-joint-large [Google Drive] [百度云xua4] | 73.15 | 81.45 | 77.04 | 83.32 | 77.28 | 81.15 | 78.34 | 78.82 | 
| 7 | sup-consert-sup-unsup-base [Google Drive] [百度云5mc8] | 73.02 | 84.86 | 77.32 | 82.70 | 78.20 | 81.34 | 75.00 | 78.92 | 
| 8 | sup-consert-sup-unsup-large [Google Drive] [百度云tta1] | 74.99 | 85.58 | 79.17 | 84.25 | 80.19 | 83.17 | 77.43 | 80.68 | 
| 9 | sup-consert-joint-unsup-base [Google Drive] [百度云cf07] | 74.46 | 84.19 | 77.08 | 83.77 | 78.55 | 81.37 | 77.01 | 79.49 | 
| 10 | sup-consert-joint-unsup-large [Google Drive] [百度云v5x5] | 76.93 | 85.20 | 78.69 | 85.44 | 79.34 | 82.93 | 76.71 | 80.75 | 
Note:
- All the base models are trained from bert-base-uncasedand the large models are trained frombert-large-uncased.
- For the unsupervised transfer, we merge all unlabeled texts from 7 STS datasets (STS12-16, STSbenchmark and SICK-Relatedness) as the training data (total 89192 sentences), and use the STSbenchmark dev split (including 1500 human-annotated sentence pairs) to select the best checkpoint.
- The sentence representations are obtained by averaging the token embeddings at the last two layers of BERT.
- For model 2 to 10, we re-trained them on a single GeForce RTX 3090 with pytorch 1.8.1 and cuda 11.1 (rather than V100, pytorch 1.6.0 and cuda 10.0 in our initial experiments) and changed the max_seq_lengthfrom 64 to 40 to reduce the required GPU memory (only for large models). Consequently, the results shown here may be slightly different from those reported in our paper.
| ID | Model | atec_ccks | bq | lcqmc | pawsx | stsb | 
|---|---|---|---|---|---|---|
| - | chinese-roberta-wwm-ext (baseline) | 11.28 | 40.21 | 59.89 | 09.35 | 60.86 | 
| - | chinese-roberta-wwm-ext-large (baseline) | 13.75 | 36.77 | 60.36 | 09.94 | 58.72 | 
| 1 | unsup-consert-base-atec_ccks [百度云v593] | 27.39 | 47.61 | 60.70 | 08.17 | 64.74 | 
| 2 | unsup-consert-base-bq [百度云cwnu] | 11.55 | 47.20 | 61.47 | 08.47 | 64.44 | 
| 3 | unsup-consert-base-lcqmc [百度云pdax] | 04.57 | 38.15 | 67.34 | 08.80 | 67.78 | 
| 4 | unsup-consert-base-pawsx [百度云iko8] | 08.66 | 38.35 | 61.97 | 09.36 | 63.89 | 
| 5 | unsup-consert-base-stsb [百度云7r7n] | 07.29 | 40.81 | 67.39 | 06.24 | 72.82 | 
| 6 | unsup-consert-large-atec_ccks [百度云4zlz] | 29.92 | 47.50 | 59.83 | 09.72 | 66.37 | 
| 7 | unsup-consert-large-bq [百度云gnte] | 15.47 | 47.08 | 60.10 | 10.04 | 66.94 | 
| 8 | unsup-consert-large-lcqmc [百度云wetp] | 11.40 | 37.79 | 66.45 | 10.81 | 67.68 | 
| 9 | unsup-consert-large-pawsx [百度云u6t9] | 11.25 | 36.25 | 65.15 | 11.38 | 67.45 | 
| 10 | unsup-consert-large-stsb [百度云8tyz] | 08.00 | 40.85 | 67.72 | 10.13 | 74.50 | 
Note:
- All the base models are trained from hfl/chinese-roberta-wwm-extand the large models are trained fromhfl/chinese-roberta-wwm-ext-large.
- For each model, we train it on single chinese STS dataset but evaluate it on all 5 datasets. The bold numbers indicate that the model is evaluated on the same dataset it trained on.
- The sentence representations are also obtained by averaging the token embeddings at the last two layers of BERT.
- For base models, we set batch_sizeto 96 andmax_seq_lengthto 64, while for large models, we setbatch_sizeto 32 andmax_seq_lengthto 40 to reduce the required GPU memory.
@inproceedings{yan-etal-2021-consert,
    title = "{C}on{SERT}: A Contrastive Framework for Self-Supervised Sentence Representation Transfer",
    author = "Yan, Yuanmeng  and
      Li, Rumei  and
      Wang, Sirui  and
      Zhang, Fuzheng  and
      Wu, Wei  and
      Xu, Weiran",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.393",
    doi = "10.18653/v1/2021.acl-long.393",
    pages = "5065--5075",
    abstract = "Learning high-quality sentence representations benefits a wide range of natural language processing tasks. Though BERT-based pre-trained language models achieve high performance on many downstream tasks, the native derived sentence representations are proved to be collapsed and thus produce a poor performance on the semantic textual similarity (STS) tasks. In this paper, we present ConSERT, a Contrastive Framework for Self-Supervised SEntence Representation Transfer, that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way. By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations and make them more applicable for downstream tasks. Experiments on STS datasets demonstrate that ConSERT achieves an 8{\%} relative improvement over the previous state-of-the-art, even comparable to the supervised SBERT-NLI. And when further incorporating NLI supervision, we achieve new state-of-the-art performance on STS tasks. Moreover, ConSERT obtains comparable results with only 1000 samples available, showing its robustness in data scarcity scenarios.",
}