Skip to content

GitHub repository for the code release of the paper "Exploring the Conformational Landscape of Adenylate Kinase and Beyond: A Benchmark of Protein Folding Models"

License

Notifications You must be signed in to change notification settings

instadeepai/FoldConfBench

Repository files navigation

FoldConfBench

This repository contains the code for FoldConfBench, accompanying our paper "Exploring the Conformational Landscape of Adenylate Kinase and Beyond: A Benchmark of Protein Folding Models" by A. Bhasin, A. Delaunay, F. Saccon, and Y. Fu. In this work, we benchmark several MSA-based methods to generate multiple protein conformations with AlphaFold2 [1]. This repository is a wrapper based on OpenFold [2], an open-source implementation of AlphaFold2, on which we integrate several MSA-based sampling methods.

Instructions to run an experiment

  1. Modify the configuration appropriately
    Ensure that all relevant configuration parameters are set correctly for your experiment (see Configuration parameters)

  2. Prepare the input directory The project expects a similar input directory structure as OpenFold [2]:

    INPUT_DIR/
    ├── alignments/
    │   └── {protein_id}/
    │       └── {protein_id}.a3m
    └── fasta/
       └── {protein_id}.fasta

    The input directory contains the raw input files needed for prediction:

    • alignments/: Contains Multiple Sequence Alignment (MSA) files in A3M format
    • fasta/: Contains protein sequences in FASTA format

    The output directory stores the processed input files and prediction results

  3. Build the image

    make build
  4. Run the experiment

    make run INPUT_DIR=YOUR_INPUT_DIR OUTPUT_DIR=YOUR_OUTPUT_DIR

Configuration parameters

  • random_seed: Random seed used for sampling. Default 42.
  • n_samples: Number of samples generated. Valid only for masking MSA sampling method.
  • msa_sampling_method: MSA sampling method.
    To be chosen among normal_run, dropout, masking, subsampling, clustering and alanine_mutation.
    See MSA sampling methods for more details.
  • msa_mask_fraction: Fraction of columns masked (used only for msa_masking method).
  • config_preset: AlphaFold2 model checkpoint name to be used. Default model_2_ptm.

MSA sampling methods

  • normal_run: Standard AlphaFold2 inference that generates structural variability through different random seeds and inherent stochasticity in the recycling process [1].

  • dropout: Enables dropout layers in the Evoformer and structure modules with 10-25% rates to modify protein representations and sample different conformations [3].

  • masking: Generates conformational diversity by randomly masking a fraction of MSA columns to disrupt co-evolutionary signals [4].

  • subsampling: Produces different conformations by varying the number of sequences and clusters in the MSA input [5].

  • clustering: Uses DBSCAN clustering on MSAs to identify sequence clusters that may correspond to different conformational states [6].

  • cfold_clustering: Implements the CFold approach which was trained specifically to predict multiple conformations from clustered sequence data [7].

  • alanine_mutation: Samples different conformations by performing systematic alanine mutations across the protein sequence [8].

References

[1] Jumper, John et al. "Highly accurate protein structure prediction with AlphaFold." Nature 596, no. 7873 (2021): 583-589.

[2] Ahdritz, Gustaf et al. "OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization." Nature methods 21, no. 8 (2024): 1514-1524.

[3] Wallner, Björn. "AFsample: improving multimer prediction with AlphaFold using massive sampling." Bioinformatics 39, no. 9 (2023): btad573.

[4] Kalakoti, Yogesh, and Björn Wallner. "AFsample2: Predicting multiple conformations and ensembles with AlphaFold2." bioRxiv (2024): 2024-05.

[5] Del Alamo, Diego, Davide Sala, Hassane S. Mchaourab, and Jens Meiler. "Sampling alternative conformational states of transporters and receptors with AlphaFold2." Elife 11 (2022): e75751.

[6] Wayment-Steele, Hannah K., Adedolapo Ojoawo, Renee Otten, Julia M. Apitz, Warintra Pitsawong, Marc Hömberger, Sergey Ovchinnikov, Lucy Colwell, and Dorothee Kern. "Predicting multiple conformations via sequence clustering and AlphaFold2." Nature 625, no. 7996 (2024): 832-839.

[7] Bryant, Patrick, and Frank Noé. "Structure prediction of alternative protein conformations." Nature Communications 15, no. 1 (2024): 7328.

[8] Stein, Richard A., and Hassane S. Mchaourab. "SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2." PLOS Computational Biology 18, no. 8 (2022): e1010483.

License

FoldConfBench is licensed under the Apache License, Version 2.0. The full license text can be found in the LICENSE file.

Copyright 2025 InstaDeep Ltd.

Citation

If you use our work in your research, please cite our paper:

@article{bhasin2025exploring,
  title={Exploring the Conformational Landscape of Adenylate Kinase and Beyond: A Benchmark of Protein Folding Models},
  author={Bhasin, Aryan and Delaunay, Antoine and Saccon, Francesco and Fu, Yunguan},
  year={2025}
}

About

GitHub repository for the code release of the paper "Exploring the Conformational Landscape of Adenylate Kinase and Beyond: A Benchmark of Protein Folding Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •