An example utils.py:
api_key = "<API key>"
temperature = <model temperature>
delay_time = <the seconds between each request>
model = "<the name of the model>"In main.py, specify the server parameters:
template: a list of prompt templates.version: a list of question versions.language: a list of language versions.label: a list of level option labels.order: a list of level orders.questionnaire_name: the selected questionnaire.name_exp: name of the save file.
Start a Server class, all pre-testing cases are created and stored in save/<name_exp>.json
test = Server(questionnaire_name, template, version, language, label, order, name_exp)Load the saved file as a new save, a protection mechanism for test interruption
test = load("<save_path>", "<new_save_name>")Run for all pre-testing cases
test.run()from server import *
template = ['t1','t2','t3','t4','t5']
version = ['v1','v2','v3','v4','v5']
language = ['En', 'Zh', 'Ko', 'Es', 'Fr', 'De', 'It', 'Ar', 'Ru', 'Ja']
label = ['n', 'al', 'au', 'rl', 'ru']
order = ['r', 'f']
questionnaire_name = 'BFI'
name_exp = 'bfi-save'
bfi_test = Server(questionnaire_name, template, version, language, label, order, name_exp)
bfi_test.run()In main.py, execute:
rephrase("<questionnaire_name>", "<specified_language>")For more details, please refer to this paper. Please remember to cite us if you find our work helpful in your work!
@inproceedings{huang2024reliability,
title={On the reliability of psychological scales on large language models},
author={Huang, Jen-tse and Jiao, Wenxiang and Lam, Man Ho and Li, Eric John and Wang, Wenxuan and Lyu, Michael},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages={6152--6173},
year={2024}
}