Next: Statistical Inference Process
Up: Why Is This Statistical
Previous: Why Use Any Statistical
  Contents
Having made the case for using a statistical process to obtain
information, there remains a need to explain why the results were
analysed using a new program, written by the author of this thesis,
implementing a Markov chain Monte Carlo method, rather than using
computer codes already in the public domain. Computer codes for
statistical inference already in the public domain (or at least, those
known to the author,) such as Gnuplot [87] and
Origin [88], typically perform both model comparison and
parameter estimation based on the
statistic for the badness
of fit of a model to the data,
 |
(5.17) |
where
indexes the experimental conditions where some measurements
have been made,
is the measured value obtained in conditions
,
is the value predicted by the model in conditions
, and
is the quoted random error on measurement
.
,
and sometimes
, depend on the values chosen for the model's
adjustable parameters, and on which of the distinct, discrete models
available is being considered; hence, the badness of fit
also
depends on these influences. The measure of goodness of fit used in
Bayesian statistics (and in some non-Bayesian methods) is
[89] the likelihood function
, which
is the conditional probability density distribution over the space of
values of the set
of measurements, given a model
and a set of
model parameter values
; the likelihood, like
,
depends on the models that are chosen. On the common
[90] assumption that this probability distribution,
as a function of any single measured value, is Gaussian in shape,
has a clear meaning in Bayesian statistics:
 |
(5.18) |
i.e.
is proportional to the logarithm of the likelihood.
The output given by the typical parameter estimation and model
comparison software already in the public domain consists of:
- The set
of values of a model's parameters that
minimizes
(equivalently, maximizes the likelihood,)
providing an estimate of the parameter values in which it is
reasonable to believe, given the experimental evidence; to find the
minimum, the standard programs use the Marquadt-Levenberg algorithm,
which is explained in Numerical Recipes
[91],
- the coefficients of a second-order Taylor expansion of the
statistic in the parameters of a model, about
,
describing a parabolic approximation to
(equivalently, a
Gaussian approximation to the likelihood) in parameter space,
providing an estimate of the uncertainty in the parameter estimates
,
- the maximum value
of the
statistic, as a
model's parameters are varied, providing an estimate of a model's
badness of fit for model comparison purposes, and, in the case of
Gnuplot,
- a version of
, adjusted by dividing it by the
difference between the number of measurements and the number of
adjustable model parameters, providing an alternative estimate of a
model's badness of fit for model comparison purposes, which attempts
to implement Occam's razor; the philosophical and statistical basis
for implementing Occam's razor in this way is unclear.
The first reason for deciding against using codes of this form is that
output of a best-fit goodness of fit for model comparison leaves the
user with a problem of how to implement Occam's razor; Gnuplot's
adjusted
attempts to implement Occam's razor by penalizing
models for having many adjustable parameters. However, if, like the
author, one believes that the purpose of Occam's razor is to prevent a
model from using information from experimental results to fine-tune
its parameters, then recycling the same information to act as evidence
for itself against other models, then using an Occam's razor based on
the number of adjustable parameters raises the possibility of unfairly
penalizing a model that, although it has many adjustable parameters,
does not fine-tune them very much on the basis of the experimental
data, i.e. that is a good fit to the experimental results over a wide
range of parameter values. As it will transpire, this is precisely
the situation encountered with the experiments and models central to
this thesis. Bayesian methods implement [85,52] Occam's razor automatically, by comparing models'
average likelihoods over all values of their parameters (known as
marginal likelihoods,) weighted according to the pre-experiment
plausibility function encoded in the prior probability distribution,
instead of comparing their maximum likelihoods; this directly
penalises [85,52] models for
fine-tuning their parameters, not for merely having parameters to
fine-tune.
The author is not claiming that it would be impossible to implement a
fair Occam's razor using, for example, Gnuplot. Indeed, since
completing the analysis of the data presented in this thesis, the
author has, in the course of a separate experimental project, devised
a formula for obtaining a marginal likelihood from the outputs of
Gnuplot. However, some programming is required to implement this.
The second reason for deciding against using the established,
publicly available codes is that parameter estimation by a
least-squares method has a number of features, some of which tend to
render it unsuitable for use in the present project:
- The assumption, needed to give
a clear meaning in
Bayesian (or any probabilistic) statistics, that the likelihood, as
a function of the value of an individual experimental datum, for
fixed model parameters, is a Gaussian, is reasonable, given that
electron arrival at the detectors is expected to be a Poisson
process, and that the numbers of electrons being counted are
sufficiently large [84] for a Gaussian to be an
extremely good approximation to a Poisson distribution; this
assumption has not been challenged in the development of the
author's Bayesian method, and will be discussed briefly below
(section 5.3.4.)
- The assumption that the likelihood associated with the
experimental data set as a whole, as a function of the adjustable
parameters of a model, is well approximated by a Gaussian (the
second order Taylor expansion of
,) cannot stand for this
experiment and the new, classical-field model presented in chapter
2. This is because, when equation
2.8, which predicts the measurable reflected beam
polarization
, as a function of the electrostatic potential
and Weiss field
in the sample, is inverted to give the ratio
(or
or
individually) as a function of
,
is found not to be single-valued. This means that the
likelihood function has two peaks in parameter space; a Gaussian
approximation can only hope to represent one of these peaks, and
therefore will lose crucial information. This difficulty is likely
to be surmountable; one of the standard computer codes could perhaps
be used with some function of
, which is a
single-valued function of
, as a fitting parameter, instead of
and
separately, or
itself. Nevertheless, this
introduces extra complication in configuring a standard fitting
program, which slightly undermines the case in terms of simplicity
for using a standard program.
- The parabolic approximation to
, or Gaussian
approximation to the likelihood, output by a least-squares
algorithm, even, if the approximation holds, encodes only the
goodness of fit of the model to the data as a function of the
adjustable parameters. In Bayesian statistics, this is only half
the story of parameter estimation; to obtain a Bayesian ``belief
function'' of the parameters, representing how strongly one believes
in each conceivable set of parameter values, given the experimental
results (this belief function is known as the posterior probability
distribution,) the likelihood must be modulated by a plausibility
function, or prior probability distribution, describing the level of
credence one gives to each conceivable set of parameters before
undertaking the experiment. Using the output of the least-squares
algorithm directly for parameter estimation would be equivalent to
an implicit, fixed assumption that the prior probability
distribution is uniform over parameter space. Elsewhere
[10], the author has expressed concern over the
philosophical consequences of prior probability distributions being
implicit and fixed. Here, it is possible to be more pragmatic,
simply noting that for many parameters, a uniform prior probability
distribution is inappropriate: for example, the parameter might be a
quantum efficiency, in which case it makes sense to give it a
non-zero prior probability density for parameter values between zero
and one, and a zero prior probability density outside this domain;
alternatively, standard data tables might give an estimate and
uncertainty for the parameter, based on previous experimental
evidence, which could be well represented by a Gaussian prior
probability distribution. This problem is also likely to be
surmountable: since completing the analysis of the data presented in
this thesis, the author has, in the course of a separate
experimental project, examined the possibility of transforming a
parameter
, over which a prior probability distribution
is
required, to a parameter
, which is a function of
, such that
, i.e. the prior probability distribution over
is uniform, and
can be used as a fitting parameter in a
least-squares algorithm; it transpires that, for the common forms of
that the author has examined to date, the differential
equation has a ready analytical solution, providing a simple formula
to transform between
and
. Nevertheless, this introduces
extra complication in configuring a standard fitting program, which
slightly undermines the case in terms of simplicity for using a
standard program.
To summarize, there are certain obstacles to the use of computer codes
already in the public domain to undertake a valid analysis of the
experimental data presented in this project. Although methods of
working around these obstacles could be devised, these methods entail
some mathematical and computational complication, and therefore
compromise the advantage of simplicity that would be the motivation
for choosing existing computer codes. Therefore, the author has, for
the purposes of this project, chosen instead to use a fundamentally
Bayesian Markov chain Monte Carlo method for parameter estimation and
model comparison, and to code this method anew, in the high-level
programming language Perl; the choice of such a high-level language
may recover some of the simplicity lost by choosing to write new code.
Having explained why the inference method has been chosen, it is time
to undertake parameter estimation and model comparison according to
that method.
Next: Statistical Inference Process
Up: Why Is This Statistical
Previous: Why Use Any Statistical
  Contents
Daniel Christopher Hatton
2004-11-30