Full-scan MS data from both LC-MS and imaging MS capture multiple ion forms, including their in/post-source fragments. Here we leverage such fragments to structurally annotate full-scan data from LC-MS or imaging MS by matching against MS/MS spectral libraries.
ms1_id is a Python package that annotates full-scan MS data using tandem MS libraries, specifically:
- annotate pseudo MS/MS spectra: mgf files
- annotate LC-MS data: mzML or mzXML files
- annotate imaging MS data: imzML files
- build indexed MS/MS libraries from mgf or msp files (see Flash entropy for more details)
pip install ms1_idPython 3.9+ is required. It has been tested on macOS (14.6, M2 Max) and Linux (Ubuntu 20.04).
Note: Indexed libraries are needed for the workflow. You can download the indexed GNPS library here.
# For LC-MS data
wget https://github.com/Philipbear/ms1_id/releases/latest/download/gnps.zip
unzip gnps.zip -d db
# For MS imaging data (fragments with mz < 100 are removed, as they are not usually included in MS imaging data)
wget https://github.com/Philipbear/ms1_id/releases/latest/download/gnps_minmz100.zip
unzip gnps_minmz100.zip -d dbIf you have pseudo MS/MS spectra in mgf format, you can directly annotate them:
ms1_id annotate --input_file pseudo_msms.mgf --libs db/gnps.pkl db/gnps_k10.pkl --min_score 0.7 --min_matched_peak 3Here, two indexed libraries are searched against, and the result tsv files will be saved in the same directory as the input file.
For more options, run:
ms1_id annotate --helpTo annotate LC-MS data, here is an example command:
ms1_id lcms --project_dir lc_ms --sample_dir data --ms1_id_libs db/gnps.pkl db/gnps_k10.pkl --ms2_id_lib db/gnps.pklHere, lc_ms is the project directory. Raw mzML or mzXML files are stored in the lc_ms/data folder. Both MS1 and MS/MS annotations will be performed. For MS1 annotation, both gnps.pkl and gnps_k10.pkl libraries are used. For MS/MS annotation, the gnps.pkl library is used. Results can be accessed from aligned_feature_table.tsv.
For more options, run:
ms1_id lcms --helpExpected runtime is ~5-7 min for a single LC-MS file. If it takes longer than 10 min, please increase the --mass_detect_int_tol parameter (default: 2e5 for Orbitraps, 5e2 for QTOFs).
To annotate MS imaging data, here is an example command:
ms1_id msi --input_dir msi --libs db/gnps_minmz100.pkl db/gnps_minmz100_k10.pkl --n_cores 12Here, msi is the input directory consisting of the imzML and ibd files. All the imzML files in the directory will be annotated individually.
Two libraries are used simultaneously, and 12 cores will be used for parallel processing. Annotation results can be accessed from ms1_id_annotations_derep.tsv
For more options, run:
ms1_id msi --helpExpected runtime is ~3-20 min for a single MS imaging dataset if at least 12 cores are available.
To build your own indexed library, run:
ms1_id index --ms2db library.msp --peak_scale_k 10 --peak_intensity_power 0.5For more options, run:
ms1_id index --helpWe provide a demo script to prepare the environment, download libraries, download LC-MS data and run the annotation workflow.
bash run.shShipei Xing, Vincent Charron-Lamoureux, Måns Ekelöf, Yasin El Abiead, Huaxu Yu, Oliver Fiehn, Theodore Alexandrov, Pieter C. Dorrestein. Structural annotation of full-scan MS data: A unified solution for LC-MS and MS imaging analyses. bioRxiv 2024.
| Data type | Dataset | Link | Instrument |
|---|---|---|---|
| LC-MS | Pooled chemical standards | MSV000095789 | Q Exactive |
| LC-MS | NIST human feces | MSV000095787 | Q Exactive |
| LC-MS | IBD dataset | PR000639 | Q Exactive |
| LC-MS | Mouse feces (lipidomics) | MSV000095868 | Q-TOF |
| LC-MS | Komagataella phaffii (yeast) | MSV000090053 | Q Exactive |
| LC-MS | Bacterial isolates | MSV000085024 | Q Exactive |
| LC-MS | Odontotaenius disjunctus microbe isolates | MSV000090030 | Q Exactive |
| LC-MS | Environmental fungal strains | MSV000090000 | Q Exactive |
| LC-MS | Sea water DOM | MSV000094338 | Q Exactive |
| LC-MS | Foam DOM | MSV000083888 | Q Exactive |
| LC-MS | Ocean DOM | MSV000083632 | Q Exactive |
| LC-MS | Plant extracts | MSV000090975 | Q Exactive |
| LC-MS | 32 plant species | MSV000090968 | Q Exactive |
| Imaging MS | Mouse liver with spotted standards | METASPACE | MALDI-Orbitrap |
| Imaging MS | Mouse brain | MTBLS313 | MALDI-FTICR |
| Imaging MS | Mouse body | METASPACE | MALDI-FTICR |
| Imaging MS | Human hepatocytes | METASPACE project | MALDI-Orbitrap |
| Imaging MS | HeLa_NIH3T3 | METASPACE project | MALDI-Orbitrap |
| Imaging MS | Populus trichocarpa root | METASPACE | MALDI-timsTOF |
| Imaging MS | Human liver tissue | METASPACE | MALDI-TOF |
| Imaging MS | Human kidney | METASPACE | MALDI-timsTOF |
| Imaging MS | Mouse kidney | METASPACE | MALDI-FTICR |
| Imaging MS | Mouse brain (TOF) | METASPACE | MALDI-TOF |
This project is licensed under the Apache 2.0 License (Copyright 2024 Shipei Xing).

