-
Notifications
You must be signed in to change notification settings - Fork 5
Precomputation for DIAScoring and DIAHelper #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
|
||
| #include <OpenMS/CHEMISTRY/AASequence.h> | ||
| #include <OpenMS/OPENSWATHALGO/DATAACCESS/DataStructures.h> | ||
| #include <OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to include the header here.
Just forward declare should be enough (to save on compile time).
class IsotopeDistributionCache; within the OpenMS namespace.
| /// simulate spectrum from AASequence | ||
| OPENMS_DLLAPI void simulateSpectrumFromAASequence(const AASequence& aa, | ||
| OPENMS_DLLAPI void simulateSpectrumFromAASequence(IsotopeDistributionCache& iso, | ||
| const AASequence& aa, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is probably a bit more intuitive to swap argument 1 and 2... because AASequence is the primary input
| #include <OpenMS/OPENSWATHALGO/DATAACCESS/TransitionExperiment.h> | ||
|
|
||
| #include <OpenMS/DATASTRUCTURES/DefaultParamHandler.h> | ||
| #include <OpenMS/ANALYSIS/OPENSWATH/DIAScoring.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this header.
forward declare IsotopeDistributionCache as before
| //@} | ||
|
|
||
|
|
||
| void precalculateDistributionCache(Size num_begin, Size index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document what the function does and what the parameters mean?
|
|
||
| void precalculateDistributionCache(Size num_begin, Size index); | ||
|
|
||
| void renormalize( TheoreticalIsotopePattern& isotopes, IsotopeDistribution& isotope_dist); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs here as well please
| const TheoreticalIsotopePattern & getIsotopeDistribution(double mass) const; | ||
| const TheoreticalIsotopePattern& getIsotopeDistribution(double mass) ; | ||
|
|
||
| const IsotopeDistribution& getIntensity(double mass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not really an intensity which is returned here, right?
Can you find a better name for the method?
Also document it
| /// Vector of pre-calculated isotope distributions for several mass windows | ||
| std::vector<TheoreticalIsotopePattern> isotope_distributions_; | ||
|
|
||
| std::vector<IsotopeDistribution> distribution_cache_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document the four members?
|
|
||
| double mass_window_width_; | ||
|
|
||
| double intensity_percentage_ ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the percentage? (document)
| CoarseIsotopePatternGenerator solver(nr_isotopes); | ||
| TheoreticalIsotopePattern isotopes; | ||
| auto d = solver.estimateFromPeptideWeight(product_mz * charge); | ||
| auto d = iso.getIntensity(product_mz * charge); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be const auto& d = .... to avoid the copy?
|
|
||
| for (std::size_t i = 0; i < spec.size(); ++i) | ||
| { | ||
| std::vector<std::pair<double, double> > isotopes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just as a suggestion for speed (even though this was not part of the PR):
move std::vector<std::pair<double, double> > isotopes; in front of the loop and just clear it inside to safe a lot of allocations.
|
|
||
| void IsotopeDistributionCache::renormalize( | ||
| OpenMS::IsotopeDistributionCache::TheoreticalIsotopePattern& isotopes, | ||
| OpenMS::IsotopeDistribution& isotope_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isotope_dist can be const, right because it is only read from?
| auto d = solver.estimateFromPeptideWeight(0.5 * mass_window_width + index * mass_window_width); | ||
| auto d = solver.estimateFromPeptideWeight(0.5*mass_window_width_ + index * mass_window_width_); | ||
|
|
||
| distribution_cache_[index] = d; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are storing a copy of d in the cache, and then go on to trim d and compute isotope_distributions_[index] based on it. Is this intended? If so, please document this somewhere
| //calculate index in the vector | ||
| Size index = static_cast<Size>(std::floor(mass / mass_window_width_)); | ||
|
|
||
| Size index = Size (std::floor(mass/mass_window_width_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for floor(), since casting to Size will take care of it (for positive masses.. which assume we have)
|
|
||
| //Return distribution | ||
| return isotope_distributions_[index]; | ||
| return isotope_distributions_[(int)index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for (int)
| //Return the intensity for a mass | ||
| const IsotopeDistribution& IsotopeDistributionCache::getIntensity(double mass) | ||
| { | ||
| Size index = Size (std::floor(mass/mass_window_width_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove floor
| } | ||
|
|
||
| //Return intensity | ||
| return distribution_cache_[(int)index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove (int)
| // Here we deal with SWATH files (can be multiple files) | ||
|
|
||
| DIAScoring dia; | ||
| auto isotope_cache = dia.getCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you not simply instanciate an IsotopeDistributionCache isotope_cache; here directly?
(remove the DIAScoring.h in this case)
Implementation of Isotope Distribution Cache in OpenSWATHWorkflow
We noticed that in this workflow where in DIAScoring and DIAHelper we compute isotope Distributions, a huge part of the runtime is lost in the calculation

estimateFromPeptideWeight(). (see the two "towers" in the flame graph below)In order to avoid doing this calculation so often, we implemented a Cache which we save as a member in each DIAScoring and DIAHelper. For this implementation, we used
IsotopeDistributionCache.cppandIsotopeDistributionCache.h. The needed feature was already implemented there, such that we just had to adapt the usage of thisisotope_distribution_member in each file and the corresponding calling functions.This change in the workflow led to a good improvement in runtime.
Benchmarks
Without caching:
OpenSwathWorkflow took 15:38 m (CPU)With IsotopDistributionCache:
OpenSwathWorkflow took 13:59 m (CPU)