The Structure of Inference
I previously described the ecological niche of statistical inference. Now let’s study the beast itself.
Kiefer in Introduction to Statistical Inference dissects a statistical inference as clearly as I have seen. Start with a set
, which is the range of a random variable representing our measurements. We augment this with a class of possible distributions
for the random variable, one of which (we believe) is the true distribution which produced our measured values.
Add a set
of decisions. In the end, our inference will single out some element of
. To measure the damage if we choose wrong, we introduce a loss function
.
is dictated by our circumstances. In an economic problem, it may be obvious: the amount of money lost if we make a mistake. Generally it is not so clear cut. The best approach is to construct inferences which depend only crudely on the form of the loss function. If we are estimating the mean of a distribution, then our answer shouldn’t depend on whether we took the absolute value between our guess and the true value as the loss, or its square.
Our actual inference is carried out by a function
, called a statistical procedure. To make an inference, we plug our measured values into
and take the decision which comes out.
The problem is to construct a logically defensible
. We can say we want the
which minimizes the loss function as much as possible. If the loss from
is always less than the loss from
, no matter what the true value we are trying to measure actually is, we would not dream of using
, but what of procedures which minimize loss over different parts of
? How are we to choose among them? All the issues of Bayesian analysis, maximum-likelihood methods, minimax techniques, and all the rest are attempts to choose a
which is optimal in some sense.
To make the idea of loss incurred by some
, we define the risk function
as
![r(t,\omega) = E[W(\omega,t(X)) | \omega, t ] r(t,\omega) = E[W(\omega,t(X)) | \omega, t ]](/wp-content/plugins/wp-latexrender/pictures/578a59b3113a01afb662726a1bf6d7b2.gif)
where
is the random variable representing our measurement. This tells us on average how well a given
will do when faced with an underlying distribution
.
Consider an example. We count the number of cosmic rays above some energy passing through a detector in a fixed time. Our random variable has range
(we can only count nonnegative integer numbers of cosmic rays), and we take the underlying distribution to be Poisson. Our class of distributions
is the set of all Poisson distributions. Since the Poisson distribution is defined by one parameter
,
is isomorphic to
.
Our decision space
will be various guesses for
, so it is also
. We could use very different
’s. For instance, we might try to decide whether there are cosmic rays of energy greater than our detector’s threshold or not, in which case
is just {yes,no}. We’ll take the loss function
to be
where
is the parameter of the underlying distribution and
is our guess. Then we have to construct a
such that the risk
is minimal in some sense for our problem at hand.
madhadron :: May.09.2007 :: statistics :: Comments Off