Protein (language model) Benchmarking Collection - PBC

This repository contains well-established datasets for interpretable and reliable protein language model (pLM) benchmarking.

Datasets

All included datasets are listed below. Details and files can be found in the respective folders.

If you want to benchmark a new or existing pLM on these datasets, please check out one of the following methods:

biotrainer: autoeval - Automatic evaluation of pLMs on our supervised benchmark datasets. You can find an example notebook here.
BETA biocentral: plm_eval - Automatic evaluation of pLMs on all benchmark datasets, including a visual leaderboard and model-to-model comparison.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
supervised		supervised
LICENSE		LICENSE
README.md		README.md