-
Notifications
You must be signed in to change notification settings - Fork 31
Gen ume #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Gen ume #199
Conversation
src/lobster/model/gen_ume/_gen_ume_sequence_structure_encoder.py
Outdated
Show resolved
Hide resolved
src/lobster/model/gen_ume/_gen_ume_sequence_structure_encoder_lightning_module.py
Outdated
Show resolved
Hide resolved
src/lobster/model/gen_ume/_gen_ume_sequence_structure_encoder_lightning_module.py
Show resolved
Hide resolved
conditional generation to acocund for this
| # Directory: input_structures: "/path/to/pdb/directory/" | ||
| # Glob pattern: input_structures: "/path/to/structures/*.pdb" | ||
| # List of files: input_structures: ["/path/to/file1.pdb", "/path/to/file2.pdb"] | ||
| input_structures: "/homefs/home/lisanzas/scratch/Develop/lobster/test_data/*.pdb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a relative path to lobster so ./test_data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a nit though
| # Get validation dataset directly (avoid trainer dependency in val_dataloader) | ||
| self.val_dataset = self.eval_datamodule._val_dataset | ||
|
|
||
| # Create our own dataloader to avoid trainer state issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the state issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was with trying to use hydra for speciifying the datasets where i wanted something local but hydra wanted to reference the global one used for training
| writepdb(filename, x_recon_xyz[i], seq[i]) | ||
| logger.info(f"Saved {filename}") | ||
|
|
||
| # Only perform ESMFold folding for the first validation batch for speed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or rather - do we only check the first batch in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we fold with esm for the first 10, but then the rest of the metrics are just for sequence identity calculations
| if "vit_decoder" == decoder_name: | ||
| x_recon_xyz = decoded_x[decoder_name] | ||
| if generate_sample["sequence_logits"].shape[-1] == 33: | ||
| seq = convert_lobster_aa_tokenization_to_standard_aa(generate_sample["sequence_logits"], device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
if generate_sample["sequence_logits"].shape[-1] == 33:
seq = convert_lobster_aa_tokenization_to_standard_aa(generate_sample["sequence_logits"], device=device)
else:
seq = generate_sample["sequence_logits"].argmax(dim=-1)
seq[seq > 21] = 20
seems a bit obsure to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for accounting for the tokenizer lobster uses so it has 33 tokens, but i didnt know how else to check for that
Description
Brief description of changes made
Type of Change