Skip to content

Conversation

@Sidney-Lisanza
Copy link
Collaborator

@Sidney-Lisanza Sidney-Lisanza commented Sep 26, 2025

Description

Brief description of changes made

Type of Change

  • Bug fix
  • [ X] New feature
  • Documentation update
  • Performance improvement
  • [X ] Code refactoring

# Directory: input_structures: "/path/to/pdb/directory/"
# Glob pattern: input_structures: "/path/to/structures/*.pdb"
# List of files: input_structures: ["/path/to/file1.pdb", "/path/to/file2.pdb"]
input_structures: "/homefs/home/lisanzas/scratch/Develop/lobster/test_data/*.pdb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a relative path to lobster so ./test_data ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a nit though

# Get validation dataset directly (avoid trainer dependency in val_dataloader)
self.val_dataset = self.eval_datamodule._val_dataset

# Create our own dataloader to avoid trainer state issues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the state issues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was with trying to use hydra for speciifying the datasets where i wanted something local but hydra wanted to reference the global one used for training

writepdb(filename, x_recon_xyz[i], seq[i])
logger.info(f"Saved {filename}")

# Only perform ESMFold folding for the first validation batch for speed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or rather - do we only check the first batch in general?

Copy link
Collaborator Author

@Sidney-Lisanza Sidney-Lisanza Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fold with esm for the first 10, but then the rest of the metrics are just for sequence identity calculations

if "vit_decoder" == decoder_name:
x_recon_xyz = decoded_x[decoder_name]
if generate_sample["sequence_logits"].shape[-1] == 33:
seq = convert_lobster_aa_tokenization_to_standard_aa(generate_sample["sequence_logits"], device=device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

if generate_sample["sequence_logits"].shape[-1] == 33:
            seq = convert_lobster_aa_tokenization_to_standard_aa(generate_sample["sequence_logits"], device=device)
        else:
            seq = generate_sample["sequence_logits"].argmax(dim=-1)
            seq[seq > 21] = 20

seems a bit obsure to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for accounting for the tokenizer lobster uses so it has 33 tokens, but i didnt know how else to check for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants