SAMGPT

This is my final year project,works on inverse design of self-assembled-monolayers (Also other types of molecule if you want)

prop_finetuning
This file contains code for both structure-conditional and property-conditional molecular generation.
Compared with the original MolGPT, I introduced an adapter module and classifier-free guidance, which enable the model to be fine-tuned using a mixture of data (structures with or without property values (N/A)). This method allows the model to incorporate information from both the pre-trained structural features and the fine-tuned property values.

scaf_finetuning
This file contains codes for just strcture constrained generation.

salience_map This file contains codes for visualization of the generation. Highlighting next token with gradient colors to represent its probability to be generated.

Dataset I used.

Source	Dataset	Samples	Block Size (SMILES Len)	Maximum Scaffold Length
Zinc	MOSES	1.9 million	54 (train), 51 (validation)	48
ChEMBL	Guacamol	1.6 million	100 (train), 99 (validation)	100
10.1021/acs.jcim.6b00340	frontier orbitals	111,725 mols	148	115
PSC literatures	SAMs	200	202	123

Training steps

Before getting the pre-trained model weight, remember to check the vocabulary of you dataset. Vocabulary size mismatch might leads to failure!
I suggest first training the base model with MOSES and Guacamol with scaffold constrained. Each pre-trained has its different applications due to the distribution of SMILES length
Do scaffold finetuning (SAMs or frontier orbitals) or property finetuning (frontier orbitals).

Adapted model architecture

Script for finetuning.

nohup python ./prop_finetuning/cond_finetune.py 
  --run_name homo_lumo_Fine_tuning 
  --train_data_name full_prop_train_mix 
  --val_data_name full_prop_valid 
  --cond_props homo lumo 
  --pretrained_model_weight ./weights/scaffold_guacamol_all.pt 
  --output_ckpt homo_lumo_fine_tunning_mix.pt 
  --batch_size 200 
  --max_epochs 10 
  >> ./homo_lumo_Fine_tuning_mix.log 2>&1 &

Script for conditional generation

nohup python ./prop_finetuning/scaf_prop_generate.py 
  --model_weight ./homo_lumo_fine_tunning_mix.pt 
  --scaffold 
  --csv_name gen_homo_lumo_mix.csv 
  --cond_props homo lumo 
  --gen_size 1000 
  --vocab_size 143 
  --block_size 202 
  --batch_size 200 
  >> ./prop_homo_lumo_generated_fine_tune.log 2>&1 &

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
SAM candidates		SAM candidates
figures		figures
prop_finetuning		prop_finetuning
salience_map		salience_map
scaf_finetuning		scaf_finetuning
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
updated_vocabulary_stoi.json		updated_vocabulary_stoi.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAMGPT

Dataset I used.

Training steps

Adapted model architecture

Script for finetuning.

Script for conditional generation

Model weight for reproducing

Property conditional generation results

Single property (LogP)

Multi-properties (LogP, TPSA)

Generated structures (-5.3 eV HOMO, -1.02 eV LUMO)

About

Uh oh!

Releases 1

Packages

Languages

License

Harrison-Li/SAMGPT

Folders and files

Latest commit

History

Repository files navigation

SAMGPT

Dataset I used.

Training steps

Adapted model architecture

Script for finetuning.

Script for conditional generation

Model weight for reproducing

Property conditional generation results

Single property (LogP)

Multi-properties (LogP, TPSA)

Generated structures (-5.3 eV HOMO, -1.02 eV LUMO)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages