Replies: 2 comments 2 replies
-
|
And how about |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
What happened to |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As of coffea v0.7.16 one now receives a warning on import of
coffea.hist:This discussion is meant to collect tips for migrating from
coffea.histtohist, and I've started by noting changes that were needed in the original coffea histogram tutorial. Feel free to add replies with additional tips and I'll incorporate them here!Construction
A quick guide to the name changes in the constructor:
coffea.hist.hist.Hist(...)Hist(...)Hist(..., name="Counts"). A name or label can be specified, and later accessed using the corresponding attribute (e.g.h.name). 1D plots (e.g.h.plot1d()) do not seem to use the name as a y-axis label.Cat("name", "label")axis.StrCategory([], growth=True, name="name", label="label")growthargument to the default False. The name and label appear at the end as (optional) keyword arguments.Bin("name", "label", 10, 0.0, 1.0)axis.Regular(10, 0.0, 1.0, name="name", label="label")Bin("name", "label", [0.0, 1.0, 3.0])axis.Variable([0.0, 1.0, 3.0], name="name", label="label")h1.compatible(h2)h1.axes == h2.axesweightwas a protected keyword and was not usable as an axis name. In hist, bothweightandsampleare protected (the latter is used e.g. forMean()accumulator types)Cataxes acted as sparse axes, which means that if you have several categorical axes in your histogram where the number of combinations of values is much less than the full outer product of axis values, the storage requirements for hist may be significantly larger than for coffea. So it is advisable to not use e.g. dataset as an axis in hist, but rather store a dictionary mapping dataset name to a replica of a hist object. Alternatively there are Stack objects.Tip: Some space savings can be found by turning off flow bins in axes where they are not needed. For example, a neural network output might be bound between [0, 1] and so an appropriate binning might be
hist.axis.Regular(20, 0, 1, flow=False). The savings multiply as the number of dimensions grows.Hist introduces a new "quick-construct" utility based on the method chaining idiom. For a histogram defined analogously to the one in the coffea histograms tutorial notebook:
we could also construct it with:
of course the character count savings is more pronounced for quick checks where axis names and labels are omitted.
Filling
Hist supports filling with identical parameters to coffea. The example in coffea histograms would be unchanged:
In addition, hist can also fill histograms by positional arguments, i.e. if you specify the arguments in the same order as in the constructor, you can fill without naming the axes:
weight=argument. For hist, one has to opt-in explicitly during construction by using theweightstorage type (as opposed to the defaultdouble.Transformation
Below are some possible transformations that can be done on the above-defined histogram
histo. Many of the methods have direct analogs. Some are a bit more opaque as they utilize the Unified Histogram Indexing (UHI) syntax.histo.sum("z")histo[{"z": slice(0, 20, sum)}]lenis a nice shorthand for specifying the last+1 bin index (20in this case, alsohist.overflow-1works.) One can also use positional arguments in the slicing, e.g.histo[:, :, :, 0:len:sum]orhisto[..., 0:20:sum]. Alternative to usingslice()directly, one can defines = hist.tag.Slicer()and then use the usual slice syntax on thesobject:histo[{"z": s[0:len:sum]}].histo.sum("z", overflow='over')histo[{"z": slice(0, hist.overflow, sum)}]histo.sum("z", overflow='all')histo[{"z": sum}]float("NaN")entries will end up in the overflow bin.histo.sum("z", overflow='allnan')histo[{"z": sum}]histo[:, 0:, 4:, 0:]histo[:, hist.loc(0.0):, 4.0j:, 0.0j:]hist.loc()tag. A shorthand forhist.locis to specify the bin edge as a pure-imaginary complex number using the python syntax4.0jwherejis the imaginary unit.histo.integrate("y", slice(0, 10))histo[{"y": slice(0.0j, 10.0j, sum)}]histo.project("y", "z", overflow="allnan")histo.project("y", "z")allnanoption. The ordering of arguments allows to permute the axis positions in hist, and they can also be specified by integer index instead of name.histo.rebin("z", hist.Bin("znew", "rebinned z value", [-10, -6, 6, 10]))histo.rebin("z", 2)histo[..., ::hist.rebin(2)]histo.scale(3.)histo *= 3.0histo*3.0histo.identifiers('samp')histo.axes["samp"]list(histo.axes["samp"]) == ['sample 1', 'sample 2']histo.identifiers('x')histo.axes["x"].edgesfor lo, hi in zip(edges[:-1], edges[1:]). The centers are also available.histo.group(...)histo.values(overflow: str)histo.values(flow: bool)histyou can only choose either none or all of the overflow bins. Instead of a dictionary mapping identifiers on the sparse axes to the dense axes, inhistyou get a rectangular numpy array, where all the axes are dense. You can use the slicing syntax to reduce the number of axes beforehand. Note that inhistthere is also ahisto.view()that provides read-write support so you can use it to update entries.Scaling a categorical axis in-place (e.g. for dataset luminosity normalization) changes from
to
where
histo.view(flow=True)provides a read-write view into the histogram's storage with numpy semantics, similar to coffeahisto.values(). Note that if thesampaxis was not the first one, the numpy-style slicehisto.view()[i]would have to change.Saving
As was the case for coffea, hist histograms can be pickled. In addition, hist histograms are much more compatible with uproot4 writing. For example one can straightaway write any TH* if it has between 1 and 3 axes (TH1, TH2, TH3). So our above
histocould be projected by sample and written as a TH3:which prints
[('histo;1', <ReadOnlyDirectory '/histo' at 0x00011ff9c2e0>), ('histo/sample 1;1', <TH3D (version 4) at 0x00011ff233d0>), ('histo/sample 2;1', <TH3D (version 4) at 0x00011ff9df10>)]Plotting
The mplhep package helps interface matplotlib with hist in a similar way as it does for coffea.hist. One can use either
mplhepdirectly or via convenience functions likeplot1d, etc. Some of themhist.plot1d(histo.sum("x", "y"), overlay='sample');zdistribution per sample without stacking or fillhist.plot1d(histo.sum("x", "y"), overlay='sample', stack=True);histo[{"x": sum, "y": sum}].plot1d(overlay="samp", histtype="fill", stack=True);hist.plot2d(histo.sum('x', 'sample'), xaxis='y');histo[{"x": sum, "samp": sum}].plot2d();project()with the approriate index permutaiton, e.g.histo[{"x": sum, "samp": sum}].project(1, 0).plot2d();would put the z axis on the horizontal.hist.plotgrid(...)hist.plotratio(...)Styling
Many of the styling examples carry over directly, since they modify matplotlib attributes. It is worth noting that mplhep now has stylesheets for all four major LHC experiments.
Use within processors
To accomodate the fact that
hist.Histdoes not (and will not) subclassAccumulatorABC, a change to the accumulator semantics was introduced in coffea v0.7.2 to allow a more flexible definition. A full processor modernization guide will be put in a separate discussion, but it boils down to replacing anydefaultdict_accumulator({"myhist": coffea.hist.Hist(...)})objects with simple dictionaries of hist objects{"myhist": hist.Hist(...)}. As for the other types of accumulators, anything that natively supports__add__(i.e. floats, ints, etc.) is natively supported now. For example, the following is now sufficient to track sum of weights in the return of aProcessorABC.processmethod:As a consequence,
ProcessorABC.accumulatorand any use ofAccumulatorABC.identity()is deprecated. To see more details of the new accumulator semantics, check out https://coffeateam.github.io/coffea/notebooks/accumulators.html#Accumulator-semanticsBeta Was this translation helpful? Give feedback.
All reactions