Minimization with MinKit¶

The main purpose of the MinKit package is to do minimizations of PDFs to data sets, finding the set of parameter values that reproduce better the data. The user must be aware about the type of PDF is trying to fit, and select a proper minimization cuantity (FCN) accordingly. For example, if we are dealing with an unbinned dataset, we need an FCN defined for binned data sets. The same stands for extended and non-extended PDFs. The user must know whether an unbinned extended maximum likelihood FCN is needed or not. To give an idea of how the minimization is done with MinKit, let’s create a simple model, generate some data, and fit it.

[1]:

%matplotlib inline
import matplotlib.pyplot as plt
import minkit

# Build the PDF
x = minkit.Parameter('x', bounds=(10, 20))
c = minkit.Parameter('c', 15, bounds=(10, 20))
s = minkit.Parameter('s', 1, bounds=(0.1, 2))

g = minkit.Gaussian('g', x, c, s)

# Keep the initial values
initials = g.get_values()

# Generate data and do the fit
data = g.generate(10000)

with minkit.minimizer('uml', g, data) as minimizer:
    minimizer.migrad()

# Plot the results
values, edges = minkit.data_plotting_arrays(data, bins=100)

centers = 0.5 * (edges[1:] + edges[:-1])

plt.hist(centers, bins=edges, weights=values, histtype='step', color='k', label='data');

gx, pdf_values = minkit.pdf_plotting_arrays(g, values, edges)
plt.plot(gx, pdf_values, color='blue', label='PDF')
plt.legend();

# Reset values
g.set_values(**initials)

FCN = 28288.575555243537	TOTAL NCALL = 74	NCALLS = 74
EDM = 2.456083689719908e-05	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	c	14.987	0.00995622			10	20	No
1	s	0.995523	0.00704013			0.1	2	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & c & 14.987 & 0.00995622 &  &  & 10.0 & 20 & No\\
\hline
1 & s & 0.995523 & 0.00704013 &  &  & 0.1 & 2 & No\\
\hline
\end{tabular}

../_images/notebooks_minimization_1_4.png

Note that we have called minkit.minimizer in order to create a context and do the minimization. The first argument is the FCN to use, an unbinned maximum likelihood in this case. It is very important that we do not change the properties of the PDF inside the with statement. This is because the PDF has enabled a cache. This means that, if the parameters have been set to constant, the evaluation of the PDF over the data values might have been kept in the cache, to reduce the execution time.

Let’s create a more complex model, adding a background-like contribution, using an exponential.

[2]:

# Make the exponential PDF
k = minkit.Parameter('k', -0.1, bounds=(-1, 0))

e = minkit.Exponential('e', x, k)

# Add the two PDFs
y = minkit.Parameter('y', 0.5, bounds=(0, 1))

pdf = minkit.AddPDFs.two_components('pdf', g, e, y)

data = pdf.generate(10000)

# Keep the initial values
initials = pdf.get_values()

# Minimize the PDF
with minkit.minimizer('uml', pdf, data) as minimizer:
    minimizer.migrad()

values, edges = minkit.data_plotting_arrays(data, bins=100)

centers = 0.5 * (edges[1:] + edges[:-1])

plt.hist(centers, bins=edges, weights=values, histtype='step', color='k', label='data');

pdf_centers, pdf_values = minkit.pdf_plotting_arrays(pdf, values, edges)
_, g_values = minkit.pdf_plotting_arrays(pdf, values, edges, component='g')
_, e_values = minkit.pdf_plotting_arrays(pdf, values, edges, component='e')

plt.plot(pdf_centers, e_values, color='orange', linestyle=':', label='Exponential')
plt.plot(pdf_centers, g_values, color='green', linestyle='--', label='Gaussian')
plt.plot(pdf_centers, pdf_values, color='blue', label='PDF')

plt.legend();

# Reset values
pdf.set_values(**initials)

FCN = 41568.87190680871	TOTAL NCALL = 99	NCALLS = 99
EDM = 5.483245260122549e-06	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	y	0.526921	0.00950927			0	1	No
1	c	14.9758	0.0202446			10	20	No
2	s	1.02638	0.020045			0.1	2	No
3	k	-0.10163	0.0057237			-1	0	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & y & 0.526921 & 0.00950927 &  &  & 0.0 & 1 & No\\
\hline
1 & c & 14.9758 & 0.0202446 &  &  & 10.0 & 20 & No\\
\hline
2 & s & 1.02638 & 0.020045 &  &  & 0.1 & 2 & No\\
\hline
3 & k & -0.10163 & 0.0057237 &  &  & -1.0 & 0 & No\\
\hline
\end{tabular}

../_images/notebooks_minimization_3_4.png

Let’s now assume we want to know the two yields of the PDFs. To do this we need to modify the model, and do and use an extended likelihood during the minimization.

[3]:

# Build the extended PDF
ng = minkit.Parameter('ng', 5000, bounds=(0, 10000))
ne = minkit.Parameter('ne', 5000, bounds=(0, 10000))

pdf = minkit.AddPDFs.two_components('pdf', g, e, ng, ne)

initials = pdf.get_values()

# Generate and fit
data = pdf.generate(int(ng.value + ne.value))

ng.value = 6000 # vary the yields to flee from the minimum
ne.value = len(data) - ng.value

with minkit.minimizer('ueml', pdf, data) as minimizer:
    minimizer.migrad()

values, edges = minkit.data_plotting_arrays(data, bins=100)

centers = 0.5 * (edges[1:] + edges[:-1])

plt.hist(centers, bins=edges, weights=values, histtype='step', color='k', label='data');

pdf_centers, pdf_values = minkit.pdf_plotting_arrays(pdf, values, edges)
_, g_values = minkit.pdf_plotting_arrays(pdf, values, edges, component='g')
_, e_values = minkit.pdf_plotting_arrays(pdf, values, edges, component='e')

plt.plot(pdf_centers, e_values, color='orange', linestyle=':', label='Exponential')
plt.plot(pdf_centers, g_values, color='green', linestyle='--', label='Gaussian')
plt.plot(pdf_centers, pdf_values, color='blue', label='PDF')

plt.legend();

# Reset the values
pdf.set_values(**initials)

FCN = -122354.9230188445	TOTAL NCALL = 124	NCALLS = 124
EDM = 2.1664581983657007e-05	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	ng	5052.31	107.283			0	10000	No
1	ne	4947.74	106.773			0	10000	No
2	c	15.0023	0.0206492			10	20	No
3	s	1.01169	0.0203878			0.1	2	No
4	k	-0.0963174	0.00560069			-1	0	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & ng & 5052.31 & 107.283 &  &  & 0.0 & 10000 & No\\
\hline
1 & ne & 4947.74 & 106.773 &  &  & 0.0 & 10000 & No\\
\hline
2 & c & 15.0023 & 0.0206492 &  &  & 10.0 & 20 & No\\
\hline
3 & s & 1.01169 & 0.0203878 &  &  & 0.1 & 2 & No\\
\hline
4 & k & -0.0963174 & 0.00560069 &  &  & -1.0 & 0 & No\\
\hline
\end{tabular}

../_images/notebooks_minimization_5_4.png

Note that we have specified the ueml (unbinned extended maximum likelihood) FCN. If we set it to uml, we would get wrong values. Let’s now see what happens with binned data sets. In this case, we will use the bml FCN.

[4]:

import numpy as np

values, edges = np.histogram(data['x'].as_ndarray(), bins=100, range=x.bounds)

binned_data = minkit.BinnedDataSet.from_ndarray(edges, x, values)

with minkit.minimizer('bml', pdf, binned_data) as minimizer:
    minimizer.migrad()

values, edges = minkit.data_plotting_arrays(data, bins=100)

centers = 0.5 * (edges[1:] + edges[:-1])

plt.hist(centers, bins=edges, weights=values, histtype='step', color='k', label='data');

pdf_centers, pdf_values = minkit.pdf_plotting_arrays(pdf, values, edges)
_, g_values = minkit.pdf_plotting_arrays(pdf, values, edges, component='g')
_, e_values = minkit.pdf_plotting_arrays(pdf, values, edges, component='e')

plt.plot(pdf_centers, e_values, color='orange', linestyle=':', label='Exponential')
plt.plot(pdf_centers, g_values, color='green', linestyle='--', label='Gaussian')
plt.plot(pdf_centers, pdf_values, color='blue', label='PDF')

plt.legend();

# Reset the values
pdf.set_values(**initials)

FCN = 85.809988390516	TOTAL NCALL = 100	NCALLS = 100
EDM = 2.6451594374739883e-07	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	ng	5053.04	7422.08			0	10000	No
1	ne	4945.67	7381.12			0	10000	No
2	c	15.0018	0.0206669			10	20	No
3	s	1.01206	0.0204419			0.1	2	No
4	k	-0.096293	0.00560712			-1	0	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & ng & 5053.04 & 7422.08 &  &  & 0.0 & 10000 & No\\
\hline
1 & ne & 4945.67 & 7381.12 &  &  & 0.0 & 10000 & No\\
\hline
2 & c & 15.0018 & 0.0206669 &  &  & 10.0 & 20 & No\\
\hline
3 & s & 1.01206 & 0.0204419 &  &  & 0.1 & 2 & No\\
\hline
4 & k & -0.096293 & 0.00560712 &  &  & -1.0 & 0 & No\\
\hline
\end{tabular}

../_images/notebooks_minimization_7_4.png

Similarly, with the FCN chi2 we would be doing a fit to the chi-square, that is, we would be assuming that the input values, instead of following a Poisson distribution, they follow a Gaussian.