Skip to content

A dataset for training and benchmarking deep learning models for RNA structure prediction

License

Notifications You must be signed in to change notification settings

marcellszi/rna3db

Repository files navigation

RNA3DB

A dataset of non-redundant RNA structures from the PDB. RNA3DB contains:

  • All RNA chains in the PDB, labelled with non-coding RNA families
  • Non-redundant clustering of the above chains, suitable for training and benchmarking deep learning models

We provide periodically updated versions of RNA3DB in JSON format, along with several intermediate steps used to generate the files.

Additionally, as part of RNA3DB, we release the results of Infernal homology search on all RNA chains found in the PDB. For a short demonstration on how RNA3DB can be used to parse these files, see tabular_demo.

For more general help getting started, see RNA3DB's Wiki.

Download

The latest version (2025-10-01-incremental-release) of RNA3DB can be found under releases.

We provide the following files:

  • rna3db-cmscans.tar.gz [Download]
    • Results of two-step Infernal homology seach on all RNA chains in the PDB
    • See tabular_demo for how to read the homology search with RNA3DB
  • rna3db-jsons.tar.gz [Download]
    • All JSON files generated by RNA3DB
  • rna3db-mmcifs.tar.xz [Download]
    • Hierarchical folders of the training/testing sets containing single-chain PDBx/mmCIF files
    • Most convenient for getting started with training and testing using RNA3DB

Customising RNA3DB or building the dataset from scratch

If you wish to add structures that were not parsed as part of the latest release (e.g. unpublished or unreleased structures that were not in the PDB at the time we downloaded and parsed the data), you will need to build the dataset from scratch. Note that this process can take several days on compute cluster.

However, if you just wish to customise some options, such as set different filtering properties, experiment with E-value thresholds, or use a different training/testing split ratio, you can simply customise RNA3DB. This process will usually take less than one minute on consumer hardware.

Customising RNA3DB

If you wish to customise an existing release of RNA3DB, please see the Wiki help page to get started.

Building the dataset from scratch

If you wish to build your own dataset from scratch, please see the Wiki help page to get started.

About

A dataset for training and benchmarking deep learning models for RNA structure prediction

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •