Skip to content

Conversation

@orbeckst
Copy link
Member

@orbeckst orbeckst commented Jun 12, 2025

Fixes #4913

Changes made in this Pull Request:

  • ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP (see also Wrong assignment or prolines? ShintaroMinami/PyDSSP#2)
  • new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior and implements the fix, False recovers old behavior); the kwarg also exists in PyDSSP
  • updated docs
  • minimal regression tests
  • updated CHANGELOG

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5065.org.readthedocs.build/en/5065/

- fix #4913
- ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP
  (see also ShintaroMinami/PyDSSP#2)
- new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior
  and implements the fix, False recovers old behavior); the kwarg also exists in
  PyDSSP
- updated docs
- minimal regression tests
- updated CHANGELOG
@orbeckst orbeckst force-pushed the update-dssp-proline-fix branch from b38e9db to 005100f Compare June 12, 2025 00:23
Copy link
Member Author

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quick draft. I'd be more than happy if someone continued and completed it.

(EDIT: ... and I am also grateful for any comments that would help me to continue working on it.)

Comment on lines 332 to 334
self._donor_mask: Optional[np.ndarray] = (
ag.residues.resnames != "PRO" if ignore_proline_donor else None
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be correct. The code runs ... but I am not sure if I should be masking corresponding atoms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the question is about shape of the donor mask (i.e. make it for residues or for atoms), then it seems to be correct -- pydssp makes it of shape l (line) and l is the number of residues (link). However, the notation I made in pydssp_numpy.py is wrong, hence I renamed it in the commit below.

if donor_mask is not None
else np.ones(n_atoms, dtype=float)
)
donor_mask = np.tile(donor_mask[:, np.newaxis], (1, n_atoms))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the donor_mask (one element for each residue) really correct for this tiling????

def test_file_guess_hydrogens(pdb_filename, client_DSSP):
# run 2.9.0 tests (which include PRO)
# ignore_proline_donor=False
# TODO: update reference data for ignore_proline_donor=True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really have correct reference data. About half of the files do not show a difference between ignore_proline_donor=True and ignore_proline_donor=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some while, I figured what's the problem: the updated pydssp just gives different results now! Namely, take 3e8mA.pdb:

0.9.0: -----EEEE-----------EE-----EE-----HHHHHHHHHHH---EEEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH-- 3e8mA.pdb
0.9.1: -----EEEE-----------EE-----EE-----HHHHHHHHHHH----EEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH--
                                                       X
``

In this particular case it used to (incorrectly) assign `E` (sheet) char to P48, and now it correctly assigns it to `-` (loop). The author didn't catch it since their test checks only correlation between pydssp and original assignment, which is still high enough since 0.9.0 changes only PRO assignment (and not even always).

@codecov
Copy link

codecov bot commented Jun 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.73%. Comparing base (5c7c480) to head (1679258).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5065      +/-   ##
===========================================
+ Coverage    92.68%   92.73%   +0.05%     
===========================================
  Files          180      180              
  Lines        22452    22457       +5     
  Branches      3186     3186              
===========================================
+ Hits         20809    20825      +16     
- Misses        1169     1176       +7     
+ Partials       474      456      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orbeckst orbeckst requested review from RMeli and Copilot September 4, 2025 00:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the treatment of proline residues in secondary structure assignment by implementing the DSSP algorithm more correctly. The fix ports changes from PyDSSP 0.9.1 that properly handles proline residues by not considering their HN atoms as hydrogen bond donors.

  • Adds ignore_proline_donor parameter to DSSP class with default True for correct behavior
  • Updates the hydrogen bond calculation to mask proline donors when requested
  • Maintains backward compatibility through the new parameter

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
package/MDAnalysis/analysis/dssp/dssp.py Adds ignore_proline_donor parameter and creates donor mask for proline residues
package/MDAnalysis/analysis/dssp/pydssp_numpy.py Updates hydrogen bond functions to accept and use donor mask for filtering
testsuite/MDAnalysisTests/analysis/test_dssp.py Adds regression tests and updates existing tests for the new proline handling
package/CHANGELOG Documents the fix and new parameter

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@orbeckst orbeckst requested a review from marinegor October 30, 2025 23:02
@orbeckst
Copy link
Member Author

@marinegor if you can spare a moment to look at this PR then that would be great; I think you're really the person with best insight into the DSSP code. Feel free to tell me that all these changes are rubbish and we need to redo everything.

@marinegor
Copy link
Contributor

@orbeckst I don't think it's all rubbish -- when this change was introduced to pydssp, I also made a similar attempt of introducing it to MDAnalysis.

My main concern is how to combine it with a workaround I introduced when PRO fix wasn't yet available. I have a feeling that in principle we should just get rid of that (I mean roughly this part) and rely purely on pydssp. I'll have a closer look.

  - fixed variable names from `n_atoms` to `n_residues`
  - updated some docs, and most importantly
  - added updated `dssp` files.
Copy link
Contributor

@marinegor marinegor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick summary: pydssp in 0.9.1 fixed the proline bug and changed (in correct direction) assignment of prolines. For example, for P48 in 3e8mA.pdb assignment used to be E, and now it's correctly -. Also the dssp output file in the original repo annotates P48 as -.

I suggest in this PR we:

  • change test files following up the pydssp update
  • remove the ignore_proline_donor option in function signature (i.e. make it always True, since the previous implementation was plain wrong)

@orbeckst while reviewing the PR, I made few commits on top of your branch -- namely, fixed variable names from n_atoms to n_residues, updated some docs, and most importantly, added updated dssp files. They're in here, feel free to pull them for yourself: //github.com/marinegor/mdanalysis/pull/new/update-dssp-proline-fix

Comment on lines 332 to 334
self._donor_mask: Optional[np.ndarray] = (
ag.residues.resnames != "PRO" if ignore_proline_donor else None
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the question is about shape of the donor mask (i.e. make it for residues or for atoms), then it seems to be correct -- pydssp makes it of shape l (line) and l is the number of residues (link). However, the notation I made in pydssp_numpy.py is wrong, hence I renamed it in the commit below.

def test_file_guess_hydrogens(pdb_filename, client_DSSP):
# run 2.9.0 tests (which include PRO)
# ignore_proline_donor=False
# TODO: update reference data for ignore_proline_donor=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some while, I figured what's the problem: the updated pydssp just gives different results now! Namely, take 3e8mA.pdb:

0.9.0: -----EEEE-----------EE-----EE-----HHHHHHHHHHH---EEEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH-- 3e8mA.pdb
0.9.1: -----EEEE-----------EE-----EE-----HHHHHHHHHHH----EEEE-----HHHHHHHHH----EEE-----HHHHHHHHHHHH---HHHEEEE---HHHHHHH----EEE------HHHH--------------HHHHHHHHH----HHHHHHH--
                                                       X
``

In this particular case it used to (incorrectly) assign `E` (sheet) char to P48, and now it correctly assigns it to `-` (loop). The author didn't catch it since their test checks only correlation between pydssp and original assignment, which is still high enough since 0.9.0 changes only PRO assignment (and not even always).

@orbeckst
Copy link
Member Author

orbeckst commented Nov 5, 2025

@marinegor thank you for the detailed review. Could you please directly update this PR with your changes? I’d really appreciate that!

I’m not sure if I get to working on the PR before the UGM this weekend. Please feel free to make any changes necessary!

thanks again !!

@marinegor
Copy link
Contributor

@orbeckst sure, I've pushed them on top of this branch.

@orbeckst
Copy link
Member Author

orbeckst commented Nov 5, 2025 via email

Copy link
Member Author

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pushing the PR across the finish line @marinegor !

I'll just push a minor doc fix and then we can squash-merge. Please feel free to do the honors. When I squash-merge I typically clean up the log and format it something like

title

- fix #xxx
- summary of changes (typically from PR summary) and/or CHANGELOG
- other important changes (from comments)
- tests updated
- docs updated
- CHANGELOG updated

---
lines identifying code contributors as provided by GH

@marinegor marinegor merged commit 4a525ee into develop Nov 14, 2025
23 of 24 checks passed
@marinegor
Copy link
Contributor

@orbeckst thanks, did my best! Initiation PR, I guess🥳

@marinegor marinegor deleted the update-dssp-proline-fix branch November 14, 2025 10:41
@orbeckst
Copy link
Member Author

Awesome, great work @marinegor !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update secondary structure assignment in DSSP after an upstream fix

3 participants