preprocess.py

Open-source set of Python scripts for preprocessing untargeted metabolomics data in the mzML file format.

Last modified: 2025-Mar-25
@author: Mateusz Fido (github.com/MateuszFido)
[email protected]
ETH Zürich

This script reads in and parses mzML files using the Pyteomics metabolomics library for Python. It calculates average mass spectra by linearly interpolating their intensity on a resampled m/z axis with a given resolution. Based on these, it creates composite mass spectra from all measurements in a given polarity mode.

It then performs peak picking on the composite spectra by using the SciPy library function find_peaks(), using peak height as cut-off criteria, and stores peak data that can then be passed into other functions.

Subsequently, it calculates time traces of all the features by integrating the interpolated intensity within the m/z boundaries of a peak.

Finally, it creates a Pandas DataFrame of the intensity matrix of n samples x m features, filters out features not present in at least X% of all samples (user-modifiable), and writes the DataFrame to a .csv file.

Installation

Install a Python interpreter (3.11. or above)
Install the required dependencies (e.g., $ pip install -r requirements.txt):
- pandas
- numpy
- scipy
- tqdm
- pyteomics

Latest version of these packages are recommended.

Usage

Run the pipeline by executing the main script on a path containing .mzml files to be processed:

$ python3 preprocess.py path/to/mzml-files

References:

Matlab Code of Jiayi Lan and Miguel de Figueiredo
Python Code of Cedric Wüthrich
https://pyteomics.readthedocs.io/en/latest/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
average.py		average.py
composite_spectrum.py		composite_spectrum.py
intensity_matrix.py		intensity_matrix.py
peak_pick.py		peak_pick.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
settings.py		settings.py
tic_correlation.py		tic_correlation.py
time_trace.py		time_trace.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

preprocess.py

Installation

Usage

References:

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

MateuszFido/preprocess.py

Folders and files

Latest commit

History

Repository files navigation

preprocess.py

Installation

Usage

References:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages