This is my final year project,works on inverse design of self-assembled-monolayers (Also other types of molecule if you want)
prop_finetuning
This file contains code for both structure-conditional and property-conditional molecular generation.
Compared with the original MolGPT, I introduced an adapter module and classifier-free guidance, which enable the model to be fine-tuned using a mixture of data (structures with or without property values (N/A)). This method allows the model to incorporate information from both the pre-trained structural features and the fine-tuned property values.
scaf_finetuning
This file contains codes for just strcture constrained generation.
salience_map This file contains codes for visualization of the generation. Highlighting next token with gradient colors to represent its probability to be generated.
| Source | Dataset | Samples | Block Size (SMILES Len) | Maximum Scaffold Length | 
|---|---|---|---|---|
| Zinc | MOSES | 1.9 million | 54 (train), 51 (validation) | 48 | 
| ChEMBL | Guacamol | 1.6 million | 100 (train), 99 (validation) | 100 | 
| 10.1021/acs.jcim.6b00340 | frontier orbitals | 111,725 mols | 148 | 115 | 
| PSC literatures | SAMs | 200 | 202 | 123 | 
- Before getting the pre-trained model weight, remember to check the vocabulary of you dataset. Vocabulary size mismatch might leads to failure!
 - I suggest first training the base model with MOSES and Guacamol with scaffold constrained. Each pre-trained has its different applications due to the distribution of SMILES length
 - Do scaffold finetuning (SAMs or frontier orbitals) or property finetuning (frontier orbitals).
 
nohup python ./prop_finetuning/cond_finetune.py 
  --run_name homo_lumo_Fine_tuning 
  --train_data_name full_prop_train_mix 
  --val_data_name full_prop_valid 
  --cond_props homo lumo 
  --pretrained_model_weight ./weights/scaffold_guacamol_all.pt 
  --output_ckpt homo_lumo_fine_tunning_mix.pt 
  --batch_size 200 
  --max_epochs 10 
  >> ./homo_lumo_Fine_tuning_mix.log 2>&1 &nohup python ./prop_finetuning/scaf_prop_generate.py 
  --model_weight ./homo_lumo_fine_tunning_mix.pt 
  --scaffold 
  --csv_name gen_homo_lumo_mix.csv 
  --cond_props homo lumo 
  --gen_size 1000 
  --vocab_size 143 
  --block_size 202 
  --batch_size 200 
  >> ./prop_homo_lumo_generated_fine_tune.log 2>&1 &



