Add NMR data model, integration with EnzymeML, and peak assignment features #13

torogi94 · 2025-01-20T15:13:43Z

Add md-models-based research data model for NMR
- Collect NMR parameters from instrument metadata
- Record processing steps preformed in NMRpy
- Save both the raw data and the latest version of the processed data
- Assign species from a list or from an EnzymeML document to the peaks
Add EnzymeML integration using pyenzyme library
- Load EnzymeML document into a FidArray
- Have EnzymeML species available within Fid object
- Save calculated concentrations to and return the EnzymeML document
Add widgets to assign peaks either on the entire FidArray or single Fid objects
- Use EnzymeML species if document available
- Pass list of EnzymeML species if no document has been loaded yet
- Use list of simple strings instead of working with EnzymeML standard
- Use only a slice of Fids when assigning on the entire FidArray by passing an index list

jmrohwer · 2025-01-20T19:56:48Z

@torogi94 Thanks for this!
I need some time to look through this. But from a brief inspection the following come to mind:

Dependencies: utils.py imports sympy as well as pyenzyme, yet none of them are declared in requirements.txt. We should discuss dependencies and which of these are needed for core functionality. One option is to have optional dependencies for certain functionalities. Pyenzyme itself has a multitude of dependencies. I am not keen for the core NMRPy library to have a huge list of dependences. Associated with that the version requirement dependency hell.
Related: how stable are pyenzyme, md-models and the other dependencies from your group? E.g. Pyenzyme vs. Pyenzyme2? My concern is long-term maintenance.
The merge conflict needs to be resolved.
Looking at data_objects.py as well as plottling.py there are a very large number of changes relating to linting and formatting (e.g. single vs. double quotes, linebreaks etc.). Are these linting changes done in a separate commit? As it is, it's very tedious to separate the real code changes from "cosmetic" linting.

torogi94 · 2025-01-23T15:48:59Z

Thank you for looking over it! Here, a few comments:

Dependencies: utils.py imports sympy as well as pyenzyme, yet none of them are declared in requirements.txt. We should discuss dependencies and which of these are needed for core functionality. One option is to have optional dependencies for certain functionalities. Pyenzyme itself has a multitude of dependencies. I am not keen for the core NMRPy library to have a huge list of dependences. Associated with that the version requirement dependency hell.

Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.

Related: how stable are pyenzyme, md-models and the other dependencies from your group? E.g. Pyenzyme vs. Pyenzyme2? My concern is long-term maintenance.

We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!

The merge conflict needs to be resolved.

Absolutely. I’ll merge main into my branch and fix the conflict manually so you can have a clean merge without conflicts!

Looking at data_objects.py as well as plottling.py there are a very large number of changes relating to linting and formatting (e.g. single vs. double quotes, linebreaks etc.). Are these linting changes done in a separate commit? As it is, it's very tedious to separate the real code changes from "cosmetic" linting.

Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed).

jmrohwer · 2025-01-24T07:34:00Z

Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.

I agree with making pydantic a core dependency and the others as optional in nmrpy[enzymeml].

On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.

We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!

Okay great.

Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed).

Give me a ping once you are ready with all of this then I'll proceed with the detailed code review 😄

torogi94 · 2025-01-29T09:46:32Z

I think I have addressed all points mentioned so far, so you may continue with the detailed code review now!

On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.

I am in contact with @JR-1991 about the conda-forge release. While it was not planned initially, he says we can make it happen! I will come back to you about this when we are preparing the PyPI release of v2. For now, the GitHub version is still used in the new optional requirements for nmrpy[enzymeml]. Of course, I will change that to the stable PyPI version as soon as it is available!

torogi94 · 2025-05-05T15:21:49Z

I have addressed a few more issues we discussed in pull requests #14 and #15 and merged them with this pull request:

With Add measurement handling #14, handling of multiple EnzymeML measurements is now possible.
With Rework data handling #15, data handling within NMRpy has been optimized to keep memory requirements and computation times low.

torogi94 · 2025-08-27T14:01:12Z

@jmrohwer: In #18, I have added handling of t0 values in MeasurementData of EnzymeML documents to the library. Initial values can now be set using FidArray.add_t0_to_enzymeml(). Both an option for interactive assignment using the new T0Adder widget and script-like handling when setting gui=False are available. The user has the option to either using the existing t1 values as t0, if no initials have been measured, or provide measured t0 values instead. Furthermore, an offset value to account for delays in measurements can be applied to the time array.

- Remove features from FidArray.calculate_concentrations() and raise NotImplementedError when called - Remove features from ConcentraionCalculator and raise NotImplementedError when initialised

- Initialisation of enzymeml_document and concentrations properties of FidArray class with initial None value led to Error. Check for None added to fix this issue. - Add correct optional dependency name to setup.py

- Move the setup of Peak objects from the Peak(Range)Assigner to the deconvolution methods to prevent uninitialised Peak objects in data model. - Update create_enzymeml() method to reflect changes in pyenzyme library. - Add species property to FidArray, similar to the deconvoluted_integrals property.

Due to an issue with md-models, pydantic>=2.10.0 is currently causing errors. In the meantime, a version restraint has been added to the requirements.txt

Fix type checking bug in enzymeml_document property of FidArray class.

* Add create_new_enzymeml_measurement() method - FidArray now has a create_new_enzymeml_measurement() method that acts as the interface for creating new Measurement objects either per GUI or script. - FidArray enzymeml_document property now checks for existence of at least one Measurement object in adder method. - plotting.py now has a skeleton MeasurementCreator class that will later be the GUI for creating new Measurement objects. - utils.py now has a utility function that creates an EnzymeML Measurement object from the parameters passed to it. * Add multiple Measurement support - Add MeasurementCreator widget as GUI for create_new_enzymeml_measurement() method - Update apply_to_enzymeml() method - Update create_enzymeml() method

* Update EnzymeML species handling - Add species type flags to get_species_from_enzymeml() util function to allow filtering for specific types of species. - Remove unnecessary display of proteins in PeakAssigner and PeakRangeAssigner using the new species type flags. * Add flags for keeping data models upon saving * Change data array handling of NMRpy data model - Add save_data_model() method to serialise the NMRpy data model. - Change handling of data arrays: They are now saved as numpy.ndarrays in each Fid object and only copied as lists into the data model upon serialisation. * Update data_objects.py - Resolve Pydantic serialisation issues (complex → string conversion). - Optimise processing loops for faster execution and lower memory usage. * Change keep_data_model flag to False

… save, not set to None

- Added T0Tab and T0Logic classes to utils.py - Added T0Adder widget class to plotting.py - Added add_t0_to_enzymeml() method that can be used interactively with a Jupyter widget using gui=True or script-like with gui=False

Changed behaviour of Fid._setup_peak_objects(). Multiple peaks per range are handled properly now by using Fid._grouped_peaklist instead of Fid.peaks. Also refactored the method to make it more robust and readable overall.

- Added NMR parameters to fid_object setter - Fixed p0 and p1 phasing parameter assignment

torogi94 added the enhancement label Jan 20, 2025

torogi94 requested a review from jmrohwer January 20, 2025 15:13

torogi94 assigned jmrohwer Jan 20, 2025

torogi94 force-pushed the data-model-refactor branch from b9db28b to 8182bfc Compare November 11, 2025 09:54

torogi94 and others added 20 commits November 13, 2025 12:49

Initial commit

c00b3d0

Minor changes

53ebb27

Implement IdentityAssigner for one FID object

85d3a4e

Update generate_api.yaml

97df96f

Update generate_api.yaml

29606d3

Update generate_api.yaml

1583813

Update generate_api.yaml

a87f2f6

Update generate_api.yaml

f164b24

Implement IdentityAssigner for entire FIDArray

188c365

Update data model

0bfbfe4

Minor bug fix

cb0def5

Add interface with EnzymeML

2feaa3d

Update IdentityAssigner

9f7fd0c

Update generate_api.yaml

aaea4f9

Update generate_api.yaml

c1bec90

Update generate_api.yaml

ce11a73

API update

030dcc5

Remove Citation info from data model

41c363e

API update

ecbaea2

Remove depricated data model

6b3e30a

torogi94 and others added 29 commits November 13, 2025 12:50

Temporarily remove conc. calculation

58e94ba

- Remove features from FidArray.calculate_concentrations() and raise NotImplementedError when called - Remove features from ConcentraionCalculator and raise NotImplementedError when initialised

Revert linting changes

037e60f

Update regular and add optional requirements

1909b9a

Add graceful handling of optional imports

921b9ec

Fix recurring typo in pyenzyme import error

7b5c86f

Remove deprecated data model linking

fd90839

Add None check for new FidArray properties

dfbb0aa

- Initialisation of enzymeml_document and concentrations properties of FidArray class with initial None value led to Error. Check for None added to fix this issue. - Add correct optional dependency name to setup.py

Update requirements.txt

7e1b192

Due to an issue with md-models, pydantic>=2.10.0 is currently causing errors. In the meantime, a version restraint has been added to the requirements.txt

Update data_objects.py

1b0349d

Fix type checking bug in enzymeml_document property of FidArray class.

fix: Fix Fid.baseline_correct() and FidArray.baseline_correct_fids()

f6659b4

update ruff.toml

d674e41

fix data_model.setter ; formatting fixes

e0a724c

Fix FidArray.save_to_file() when deleting data model

651684c

rework calibrate() widget not to make use of asyncio.Future()

38072de

revert @data_model.setter changes as data_model is now deleted during…

c851b60

… save, not set to None

fix pyenzyme imports, update dependencies

e987ad3

Upgrade to pyenzyme v2

b28d5e6

Add t0 handling (#18)

3dc3d12

- Added T0Tab and T0Logic classes to utils.py - Added T0Adder widget class to plotting.py - Added add_t0_to_enzymeml() method that can be used interactively with a Jupyter widget using gui=True or script-like with gui=False

Fix copy() error in plotting.py

6cf0830

Add explicit super init to FidArray

4f48277

Update range handling in Fid (#20)

d796f8d

Changed behaviour of Fid._setup_peak_objects(). Multiple peaks per range are handled properly now by using Fid._grouped_peaklist instead of Fid.peaks. Also refactored the method to make it more robust and readable overall.

Add attribute checks for data model existence

80e4e12

Fix faulty argument to T0Adder widget

c3dca2d

Add data model test suites

7a3e67d

Add data model property unit tests + fixes

2309dc2

Fix parameter mapping to data model

9f55722

- Added NMR parameters to fid_object setter - Fixed p0 and p1 phasing parameter assignment

torogi94 force-pushed the data-model-refactor branch from 8182bfc to 9f55722 Compare November 13, 2025 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NMR data model, integration with EnzymeML, and peak assignment features #13

Add NMR data model, integration with EnzymeML, and peak assignment features #13

Uh oh!

torogi94 commented Jan 20, 2025

Uh oh!

jmrohwer commented Jan 20, 2025 •

edited

Loading

Uh oh!

torogi94 commented Jan 23, 2025 •

edited

Loading

Uh oh!

jmrohwer commented Jan 24, 2025 •

edited

Loading

Uh oh!

torogi94 commented Jan 29, 2025

Uh oh!

torogi94 commented May 5, 2025

Uh oh!

torogi94 commented Aug 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add NMR data model, integration with EnzymeML, and peak assignment features #13

Are you sure you want to change the base?

Add NMR data model, integration with EnzymeML, and peak assignment features #13

Uh oh!

Conversation

torogi94 commented Jan 20, 2025

Uh oh!

jmrohwer commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torogi94 commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmrohwer commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torogi94 commented Jan 29, 2025

Uh oh!

torogi94 commented May 5, 2025

Uh oh!

torogi94 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmrohwer commented Jan 20, 2025 •

edited

Loading

torogi94 commented Jan 23, 2025 •

edited

Loading

jmrohwer commented Jan 24, 2025 •

edited

Loading

torogi94 commented Aug 27, 2025 •

edited

Loading