-
Notifications
You must be signed in to change notification settings - Fork 743
fix inclusion of PRO in secondary structure #5065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- fix #4913 - ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP (see also ShintaroMinami/PyDSSP#2) - new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior and implements the fix, False recovers old behavior); the kwarg also exists in PyDSSP - updated docs - minimal regression tests - updated CHANGELOG
b38e9db to
005100f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quick draft. I'd be more than happy if someone continued and completed it.
(EDIT: ... and I am also grateful for any comments that would help me to continue working on it.)
| self._donor_mask: Optional[np.ndarray] = ( | ||
| ag.residues.resnames != "PRO" if ignore_proline_donor else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be correct. The code runs ... but I am not sure if I should be masking corresponding atoms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if donor_mask is not None | ||
| else np.ones(n_atoms, dtype=float) | ||
| ) | ||
| donor_mask = np.tile(donor_mask[:, np.newaxis], (1, n_atoms)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the donor_mask (one element for each residue) really correct for this tiling????
| def test_file_guess_hydrogens(pdb_filename, client_DSSP): | ||
| # run 2.9.0 tests (which include PRO) | ||
| # ignore_proline_donor=False | ||
| # TODO: update reference data for ignore_proline_donor=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really have correct reference data. About half of the files do not show a difference between ignore_proline_donor=True and ignore_proline_donor=False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some while, I figured what's the problem: the updated pydssp just gives different results now! Namely, take 3e8mA.pdb:
0.9.0: -----EEEE-----------EE-----EE-----HHHHHHHHHHH---EEEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH-- 3e8mA.pdb
0.9.1: -----EEEE-----------EE-----EE-----HHHHHHHHHHH----EEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH--
X
``
In this particular case it used to (incorrectly) assign `E` (sheet) char to P48, and now it correctly assigns it to `-` (loop). The author didn't catch it since their test checks only correlation between pydssp and original assignment, which is still high enough since 0.9.0 changes only PRO assignment (and not even always).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5065 +/- ##
===========================================
+ Coverage 92.68% 92.73% +0.05%
===========================================
Files 180 180
Lines 22452 22457 +5
Branches 3186 3186
===========================================
+ Hits 20809 20825 +16
- Misses 1169 1176 +7
+ Partials 474 456 -18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the treatment of proline residues in secondary structure assignment by implementing the DSSP algorithm more correctly. The fix ports changes from PyDSSP 0.9.1 that properly handles proline residues by not considering their HN atoms as hydrogen bond donors.
- Adds
ignore_proline_donorparameter to DSSP class with defaultTruefor correct behavior - Updates the hydrogen bond calculation to mask proline donors when requested
- Maintains backward compatibility through the new parameter
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| package/MDAnalysis/analysis/dssp/dssp.py | Adds ignore_proline_donor parameter and creates donor mask for proline residues |
| package/MDAnalysis/analysis/dssp/pydssp_numpy.py | Updates hydrogen bond functions to accept and use donor mask for filtering |
| testsuite/MDAnalysisTests/analysis/test_dssp.py | Adds regression tests and updates existing tests for the new proline handling |
| package/CHANGELOG | Documents the fix and new parameter |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
@marinegor if you can spare a moment to look at this PR then that would be great; I think you're really the person with best insight into the DSSP code. Feel free to tell me that all these changes are rubbish and we need to redo everything. |
|
@orbeckst I don't think it's all rubbish -- when this change was introduced to pydssp, I also made a similar attempt of introducing it to MDAnalysis. My main concern is how to combine it with a workaround I introduced when PRO fix wasn't yet available. I have a feeling that in principle we should just get rid of that (I mean roughly this part) and rely purely on pydssp. I'll have a closer look. |
- fixed variable names from `n_atoms` to `n_residues` - updated some docs, and most importantly - added updated `dssp` files.
marinegor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick summary: pydssp in 0.9.1 fixed the proline bug and changed (in correct direction) assignment of prolines. For example, for P48 in 3e8mA.pdb assignment used to be E, and now it's correctly -. Also the dssp output file in the original repo annotates P48 as -.
I suggest in this PR we:
- change test files following up the pydssp update
- remove the
ignore_proline_donoroption in function signature (i.e. make it alwaysTrue, since the previous implementation was plain wrong)
@orbeckst while reviewing the PR, I made few commits on top of your branch -- namely, fixed variable names from n_atoms to n_residues, updated some docs, and most importantly, added updated dssp files. They're in here, feel free to pull them for yourself: //github.com/marinegor/mdanalysis/pull/new/update-dssp-proline-fix
| self._donor_mask: Optional[np.ndarray] = ( | ||
| ag.residues.resnames != "PRO" if ignore_proline_donor else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_file_guess_hydrogens(pdb_filename, client_DSSP): | ||
| # run 2.9.0 tests (which include PRO) | ||
| # ignore_proline_donor=False | ||
| # TODO: update reference data for ignore_proline_donor=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some while, I figured what's the problem: the updated pydssp just gives different results now! Namely, take 3e8mA.pdb:
0.9.0: -----EEEE-----------EE-----EE-----HHHHHHHHHHH---EEEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH-- 3e8mA.pdb
0.9.1: -----EEEE-----------EE-----EE-----HHHHHHHHHHH----EEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH--
X
``
In this particular case it used to (incorrectly) assign `E` (sheet) char to P48, and now it correctly assigns it to `-` (loop). The author didn't catch it since their test checks only correlation between pydssp and original assignment, which is still high enough since 0.9.0 changes only PRO assignment (and not even always).|
@marinegor thank you for the detailed review. Could you please directly update this PR with your changes? I’d really appreciate that! I’m not sure if I get to working on the PR before the UGM this weekend. Please feel free to make any changes necessary! thanks again !! |
|
@orbeckst sure, I've pushed them on top of this branch. |
|
Thanks!!
… On Nov 5, 2025, at 04:31, Egor Marin ***@***.***> wrote:
marinegor
left a comment
(MDAnalysis/mdanalysis#5065)
<#5065 (comment)>
@orbeckst <https://github.com/orbeckst> sure, I've pushed them on top of this branch.
—
Reply to this email directly, view it on GitHub <#5065 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB2DHGLGDX4FYVKHCRCBQL33HNX7AVCNFSM6AAAAAB7D57EXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJQGY4DIOJZGU>.
You are receiving this because you were mentioned.
--
Oliver Beckstein (he/his/him)
***@***.***
|
orbeckst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pushing the PR across the finish line @marinegor !
I'll just push a minor doc fix and then we can squash-merge. Please feel free to do the honors. When I squash-merge I typically clean up the log and format it something like
title
- fix #xxx
- summary of changes (typically from PR summary) and/or CHANGELOG
- other important changes (from comments)
- tests updated
- docs updated
- CHANGELOG updated
---
lines identifying code contributors as provided by GH
|
@orbeckst thanks, did my best! Initiation PR, I guess🥳 |
|
Awesome, great work @marinegor ! |
Fixes #4913
Changes made in this Pull Request:
PR Checklist
package/CHANGELOGfile updated?package/AUTHORS? (If it is not, add it!)Developers Certificate of Origin
I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5065.org.readthedocs.build/en/5065/