Working in more than one dimension¶
Although in general the case of having a fit to 1D data is the most frequent, sometimes it is needed to perform fits in many dimensions. This is completely supported by MinKit, for both the unbinned and binned cases, although with some limitations. These limitations are imposed by the multidimensional nature of the problem, since calculating numerical integrals might lead to enormous arrays for the grids to use. As always, we start with a few imports, and we will define a function to help us to plot the results.
[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import minkit
import numpy as np
def create_figure():
fig = plt.figure(figsize=(10, 10))
left, width = 0.12, 0.55
bottom, height = 0.12, 0.55
bottom_h = left_h = left + width + 0.02
ax0 = plt.axes([left, bottom, width, height])
ax1 = plt.axes([left, bottom_h, width, 0.25], sharex=ax0)
ax2 = plt.axes([left_h, bottom, 0.25, height], sharey=ax0)
plt.setp(ax1.get_xticklabels(), visible=False)
plt.setp(ax2.get_yticklabels(), visible=False)
return fig, ax0, ax1, ax2
Unbinned case¶
Let’s create an unbinned data set in two dimensions from two different Gaussian distributions with shared mean and different standard deviation, do a fit to it, and plot the results.
[2]:
# Define the model
x = minkit.Parameter('x', bounds=(-5, +5))
y = minkit.Parameter('y', bounds=(-5, +5))
c = minkit.Parameter('c', 0, bounds=(-5, +5))
sx = minkit.Parameter('sx', 2, bounds=(1, 3))
sy = minkit.Parameter('sy', 1, bounds=(0.5, 3))
gx = minkit.Gaussian('gx', x, c, sx)
gy = minkit.Gaussian('gy', y, c, sy)
pdf = minkit.ProdPDFs('pdf', [gx, gy])
# Generate data and fit it
data = pdf.generate(100000)
with minkit.minimizer('uml', pdf, data) as minimizer:
minimizer.migrad()
# Plot the results
values, (xedges, yedges) = minkit.data_plotting_arrays(data, bins=100)
cx = 0.5 * (xedges[1:] + xedges[:-1])
cy = 0.5 * (yedges[1:] + yedges[:-1])
vx, vy = tuple(a.flatten() for a in np.meshgrid(cx, cy))
fig, ax0, ax1, ax2 = create_figure()
# Plot the data
ax0.hist2d(vx, vy, bins=(xedges, yedges), weights=values);
# Calculate the projections of the data and plot them
xproj, _ = minkit.data_plotting_arrays(data, bins=100, projection='x')
yproj, _ = minkit.data_plotting_arrays(data, bins=100, projection='y')
ax1.hist(cx, bins=xedges, weights=xproj, histtype='step', color='k');
ax2.hist(cy, bins=yedges, weights=yproj, histtype='step', orientation='horizontal', color='k');
# Calculate the values of the PDF and plot them
gxc, xpdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='x')
gyc, ypdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='y')
ax1.plot(gxc, xpdf, color='b');
ax2.plot(ypdf, gyc, color='b');
FCN = 694289.713121555 | TOTAL NCALL = 81 | NCALLS = 81 |
EDM = 9.269661053992134e-05 | GOAL EDM = 1e-05 | UP = 1.0 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed? |
0 | c | 0.00187518 | 0.00285409 | -5 | 5 | No | ||
1 | sx | 1.99467 | 0.00544549 | 1 | 3 | No | ||
2 | sy | 0.999087 | 0.00223423 | 0.5 | 3 | No |
The amount of code needed to generate and fit is very similar to that of the 1D case. In order to plot the results, it is needed to work a bit more if one wants to get the projections of the PDF in each dimension, since it comes necessary to sum the contribution in the other. By using numpy.histogramdd one can extend the previous code to the n-dimensional case, although computation might turn slower due to the curse of dimensionality.
Binned case¶
Similarly to what has been done in the previous section, one can do the same thing on binned data sets.
[3]:
# Transform data to binned, and do a fit to it
binned_data = data.make_binned(bins=(100, 100))
xedges, yedges = binned_data['x'].as_ndarray(), binned_data['y'].as_ndarray()
values = binned_data.values.as_ndarray()
with minkit.minimizer('bml', pdf, binned_data) as minimizer:
minimizer.migrad()
# Plot the results
fig, ax0, ax1, ax2 = create_figure()
data_values, (xedges, yedges) = minkit.data_plotting_arrays(binned_data)
cx = 0.5 * (xedges[1:] + xedges[:-1])
cy = 0.5 * (yedges[1:] + yedges[:-1])
vx, vy = tuple(a.flatten() for a in np.meshgrid(cx, cy))
ax0.hist2d(vx, vy, bins=(xedges, yedges), weights=data_values);
# Calculate the projections of the data and plot them
xproj, _ = minkit.data_plotting_arrays(binned_data, projection='x')
yproj, _ = minkit.data_plotting_arrays(binned_data, projection='y')
cx = 0.5 * (xedges[1:] + xedges[:-1])
cy = 0.5 * (yedges[1:] + yedges[:-1])
ax1.hist(cx, bins=xedges, weights=xproj, histtype='step', color='k');
ax2.hist(cy, bins=yedges, weights=yproj, histtype='step', orientation='horizontal', color='k');
# Calculate the values of the PDF and plot them
gxc, xpdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='x')
gyc, ypdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='y')
ax1.plot(gxc, xpdf, color='b');
ax2.plot(ypdf, gyc, color='b');
FCN = 7115.751392859857 | TOTAL NCALL = 34 | NCALLS = 34 |
EDM = 2.6395486866473276e-09 | GOAL EDM = 1e-05 | UP = 1.0 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed? |
0 | c | 0.00191893 | 0.00285086 | -5 | 5 | No | ||
1 | sx | 1.99482 | 0.00542073 | 1 | 3 | No | ||
2 | sy | 0.999001 | 0.00223591 | 0.5 | 3 | No |
As it can be clearly seen, the result is similar to that of the unbinned case.