Working in more than one dimension

Although in general the case of having a fit to 1D data is the most frequent, sometimes it is needed to perform fits in many dimensions. This is completely supported by MinKit, for both the unbinned and binned cases, although with some limitations. These limitations are imposed by the multidimensional nature of the problem, since calculating numerical integrals might lead to enormous arrays for the grids to use. As always, we start with a few imports, and we will define a function to help us to plot the results.

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import minkit
import numpy as np

def create_figure():

    fig = plt.figure(figsize=(10, 10))

    left, width = 0.12, 0.55
    bottom, height = 0.12, 0.55
    bottom_h = left_h = left + width + 0.02

    ax0 = plt.axes([left, bottom, width, height])
    ax1 = plt.axes([left, bottom_h, width, 0.25], sharex=ax0)
    ax2 = plt.axes([left_h, bottom, 0.25, height], sharey=ax0)

    plt.setp(ax1.get_xticklabels(), visible=False)
    plt.setp(ax2.get_yticklabels(), visible=False)

    return fig, ax0, ax1, ax2

Unbinned case

Let’s create an unbinned data set in two dimensions from two different Gaussian distributions with shared mean and different standard deviation, do a fit to it, and plot the results.

[2]:
# Define the model
x = minkit.Parameter('x', bounds=(-5, +5))
y = minkit.Parameter('y', bounds=(-5, +5))
c = minkit.Parameter('c', 0, bounds=(-5, +5))

sx = minkit.Parameter('sx', 2, bounds=(1, 3))
sy = minkit.Parameter('sy', 1, bounds=(0.5, 3))

gx = minkit.Gaussian('gx', x, c, sx)
gy = minkit.Gaussian('gy', y, c, sy)

pdf = minkit.ProdPDFs('pdf', [gx, gy])

# Generate data and fit it
data = pdf.generate(100000)

with minkit.minimizer('uml', pdf, data) as minimizer:
    minimizer.migrad()

# Plot the results
values, (xedges, yedges) = minkit.data_plotting_arrays(data, bins=100)

cx = 0.5 * (xedges[1:] + xedges[:-1])
cy = 0.5 * (yedges[1:] + yedges[:-1])

vx, vy = tuple(a.flatten() for a in np.meshgrid(cx, cy))

fig, ax0, ax1, ax2 = create_figure()

# Plot the data
ax0.hist2d(vx, vy, bins=(xedges, yedges), weights=values);

# Calculate the projections of the data and plot them
xproj, _ = minkit.data_plotting_arrays(data, bins=100, projection='x')
yproj, _ = minkit.data_plotting_arrays(data, bins=100, projection='y')

ax1.hist(cx, bins=xedges, weights=xproj, histtype='step', color='k');
ax2.hist(cy, bins=yedges, weights=yproj, histtype='step', orientation='horizontal', color='k');

# Calculate the values of the PDF and plot them
gxc, xpdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='x')
gyc, ypdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='y')

ax1.plot(gxc, xpdf, color='b');
ax2.plot(ypdf, gyc, color='b');

FCN = 694289.713121555 TOTAL NCALL = 81 NCALLS = 81
EDM = 9.269661053992134e-05 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed?
0 c 0.00187518 0.00285409 -5 5 No
1 sx 1.99467 0.00544549 1 3 No
2 sy 0.999087 0.00223423 0.5 3 No

../_images/notebooks_multidimensional_3_4.png

The amount of code needed to generate and fit is very similar to that of the 1D case. In order to plot the results, it is needed to work a bit more if one wants to get the projections of the PDF in each dimension, since it comes necessary to sum the contribution in the other. By using numpy.histogramdd one can extend the previous code to the n-dimensional case, although computation might turn slower due to the curse of dimensionality.

Binned case

Similarly to what has been done in the previous section, one can do the same thing on binned data sets.

[3]:
# Transform data to binned, and do a fit to it
binned_data = data.make_binned(bins=(100, 100))

xedges, yedges = binned_data['x'].as_ndarray(), binned_data['y'].as_ndarray()

values = binned_data.values.as_ndarray()

with minkit.minimizer('bml', pdf, binned_data) as minimizer:
    minimizer.migrad()

# Plot the results
fig, ax0, ax1, ax2 = create_figure()

data_values, (xedges, yedges) = minkit.data_plotting_arrays(binned_data)

cx = 0.5 * (xedges[1:] + xedges[:-1])
cy = 0.5 * (yedges[1:] + yedges[:-1])

vx, vy = tuple(a.flatten() for a in np.meshgrid(cx, cy))

ax0.hist2d(vx, vy, bins=(xedges, yedges), weights=data_values);

# Calculate the projections of the data and plot them
xproj, _ = minkit.data_plotting_arrays(binned_data, projection='x')
yproj, _ = minkit.data_plotting_arrays(binned_data, projection='y')

cx = 0.5 * (xedges[1:] + xedges[:-1])
cy = 0.5 * (yedges[1:] + yedges[:-1])

ax1.hist(cx, bins=xedges, weights=xproj, histtype='step', color='k');
ax2.hist(cy, bins=yedges, weights=yproj, histtype='step', orientation='horizontal', color='k');

# Calculate the values of the PDF and plot them
gxc, xpdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='x')
gyc, ypdf = minkit.pdf_plotting_arrays(pdf, values, (xedges, yedges), projection='y')

ax1.plot(gxc, xpdf, color='b');
ax2.plot(ypdf, gyc, color='b');

FCN = 7115.751392859857 TOTAL NCALL = 34 NCALLS = 34
EDM = 2.6395486866473276e-09 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed?
0 c 0.00191893 0.00285086 -5 5 No
1 sx 1.99482 0.00542073 1 3 No
2 sy 0.999001 0.00223591 0.5 3 No

../_images/notebooks_multidimensional_6_4.png

As it can be clearly seen, the result is similar to that of the unbinned case.