SCF Guess Datasets

This package provides a number of molecular datasets to be used in a ML context. Each dataset represents a set of randomly selected molecules from an original collection of xyz files, augmented with tensor quantities obtained via DFT calculations.

Datasets

For a description of each dataset see quantum-machine.org. Attention: The used basis set file for 6-31G(2df,p) were modified.

Qm9 134k small organic molecules of CHONF. Molecules are non-charged and closed-shell. The train/val/test sets are roughly stratified with respect to molecular size.Calculated at B3LYP/6-31G(2df,p) level.
Qm9Isomeres 6k constitutional isomeres from C7H10O2 taken from Qm9. Calculated at B3LYP/6-31G(2df,p) level.
Qm9IsomeresMd Molecular trajectories of 113 molecules at 500 K randomly selected from Qm9 Isomeres. Calculated at the B3LYP/6-31G(2df,p) level.

Installation

Clone this repository to scf_guess_datasets
Invoke cd scf_guess_datasets && pip install -e .

How to use

from scf_guess_datasets import Qm9Isomeres

dataset = Qm9Isomeres(
    "/home/bob/datasets", # data stored in /home/bob/datasets/qm9_isomeres
    size=10, # number of molecules (optional, just for testing)
    val=0.1, # fraction of validation samples (optional, just for testing)
    test=0.1 # fraction of test samples (optional, just for testing)
)

dataset.build() # just once, omit if /home/bob/datasets/qm9_isomeres exists

for key in dataset.train_keys:  # same for val_keys or test_keys
    sample = dataset.solution(key)  # dft result for that molecule

    print(sample.overlap)  # NDArray from PysCF
    print(sample.hcore)  # NDArray from PysCF
    print(sample.density)  # NDArray from PysCF
    print(sample.fock)  # NDArray from PysCF

    print(sample.status)  # Status(converged=True, iterations=11)

    for scheme, sample in dataset.guesses(key).items():
        # sample has same structure as returned by dataset.solution
        # matrices correspond to the initial guess
        # status describes calculation starting from guess

        print(scheme, sample.status)

# Let's score some custom-made density matrix for a given molecule

from scf_guess_datasets import solve
import numpy as np

solver = dataset.solver(3)  # obtain a new solver for molecule 3
guess = np.ones_like(solver.get_ovlp())
overlap, hcore, density, fock, status = solve(solver, guess)
print(density)  # the converged density
print(status)  # Status(converged=True, iterations=19)

Structure

Each dataset provided by this package implements the scf_guess_datasets.Dataset interface. A single implementation is represented by an individual package, containing a xyz directory as well as an optional basis.gbs basis set file. In order to add a new dataset, create a new subpackage for it and adapt to your needs, e.g. by specifying a custom basis or functional.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
scf_guess_datasets		scf_guess_datasets
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SCF Guess Datasets

Datasets

Installation

How to use

Structure

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

hauser-group/scf_guess_datasets

Folders and files

Latest commit

History

Repository files navigation

SCF Guess Datasets

Datasets

Installation

How to use

Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages