A dataset of non-redundant RNA structures from the PDB. RNA3DB contains:
- All RNA chains in the PDB, labelled with non-coding RNA families
- Non-redundant clustering of the above chains, suitable for training and benchmarking deep learning models
We provide periodically updated versions of RNA3DB in JSON format, along with several intermediate steps used to generate the files.
Additionally, as part of RNA3DB, we release the results of Infernal homology search on all RNA chains found in the PDB. For a short demonstration on how RNA3DB can be used to parse these files, see tabular_demo.
For more general help getting started, see RNA3DB's Wiki.
The latest version (2025-10-01-incremental-release) of RNA3DB can be found under releases.
We provide the following files:
rna3db-cmscans.tar.gz[Download]- Results of two-step Infernal homology seach on all RNA chains in the PDB
- See tabular_demo for how to read the homology search with RNA3DB
rna3db-jsons.tar.gz[Download]- All JSON files generated by RNA3DB
rna3db-mmcifs.tar.xz[Download]- Hierarchical folders of the training/testing sets containing single-chain PDBx/mmCIF files
- Most convenient for getting started with training and testing using RNA3DB
If you wish to add structures that were not parsed as part of the latest release (e.g. unpublished or unreleased structures that were not in the PDB at the time we downloaded and parsed the data), you will need to build the dataset from scratch. Note that this process can take several days on compute cluster.
However, if you just wish to customise some options, such as set different filtering properties, experiment with E-value thresholds, or use a different training/testing split ratio, you can simply customise RNA3DB. This process will usually take less than one minute on consumer hardware.
If you wish to customise an existing release of RNA3DB, please see the Wiki help page to get started.
If you wish to build your own dataset from scratch, please see the Wiki help page to get started.