"Natural parameter" redirects here. For use of this term in differential geometry, see Natural parametrization.
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family",[1] or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.
The concept of exponential families is credited to[2]E. J. G. Pitman,[3]G. Darmois,[4] and B. O. Koopman[5] in 1935–1936. Exponential families of distributions provide a general framework for selecting a possible alternative parameterisation of a parametric family of distributions, in terms of natural parameters, and for defining useful sample statistics, called the natural sufficient statistics of the family.
The terms "distribution" and "family" are often used loosely: Specifically, an exponential family is a set of distributions, where the specific distribution varies with the parameter;[a] however, a parametric family of distributions is often referred to as "a distribution" (like "the normal distribution", meaning "the family of normal distributions"), and the set of all exponential families is sometimes loosely referred to as "the" exponential family.
Most of the commonly used distributions form an exponential family or subset of an exponential family, listed in the subsection below. The subsections following it are a sequence of increasingly more general mathematical definitions of an exponential family. A casual reader may wish to restrict attention to the first and simplest definition, which corresponds to a single-parameter family of discrete or continuous probability distributions.
Importantly, the support of (all the possible values for which is greater than ) is required to not depend on [7] This requirement can be used to exclude a parametric family distribution from being an exponential family.
For example: The Pareto distribution has a pdf which is defined for (the minimum value, being the scale parameter) and its support, therefore, has a lower limit of Since the support of is dependent on the value of the parameter, the family of Pareto distributions does not form an exponential family of distributions (at least when is unknown).
Another example: Bernoulli-type distributions – binomial, negative binomial, geometric distribution, and similar – can only be included in the exponential class if the number of Bernoulli trials, is treated as a fixed constant – excluded from the free parameter(s) – since the allowed number of trials sets the limits for the number of "successes" or "failures" that can be observed in a set of trials.
Often is a vector of measurements, in which case may be a function from the space of possible values of to the real numbers.
More generally, and can each be vector-valued such that is real-valued. However, see the discussion below on vector parameters, regarding the curved exponential family.
If then the exponential family is said to be in canonical form. By defining a transformed parameter it is always possible to convert an exponential family to canonical form. The canonical form is non-unique, since can be multiplied by any nonzero constant, provided that is multiplied by that constant's reciprocal, or a constant c can be added to and multiplied by to offset it. In the special case that and then the family is called a natural exponential family.
Even when is a scalar, and there is only a single parameter, the functions and can still be vectors, as described below.
The function or equivalently is automatically determined once the other functions have been chosen, since it must assume a form that causes the distribution to be normalized (sum or integrate to one over the entire domain). Furthermore, both of these functions can always be written as functions of even when is not a one-to-one function, i.e. two or more different values of map to the same value of and hence cannot be inverted. In such a case, all values of mapping to the same will also have the same value for and
What is important to note, and what characterizes all exponential family variants, is that the parameter(s) and the observation variable(s) must factorize (can be separated into products each of which involves only one type of variable), either directly or within either part (the base or exponent) of an exponentiation operation. Generally, this means that all of the factors constituting the density or mass function must be of one of the following forms:
where and are arbitrary functions of the observed statistical variable; and are arbitrary functions of the fixed parameters defining the shape of the distribution; and is any arbitrary constant expression (i.e. a number or an expression that does not change with either or ).
There are further restrictions on how many such factors can occur. For example, the two expressions:
are the same, i.e. a product of two "allowed" factors. However, when rewritten into the factorized form,
it can be seen that it cannot be expressed in the required form. (However, a form of this sort is a member of a curved exponential family, which allows multiple factorized terms in the exponent.[citation needed])
To see why an expression of the form
qualifies,
and hence factorizes inside of the exponent. Similarly,
and again factorizes inside of the exponent.
A factor consisting of a sum where both types of variables are involved (e.g. a factor of the form ) cannot be factorized in this fashion (except in some cases where occurring directly in an exponent); this is why, for example, the Cauchy distribution and Student's t distribution are not exponential families.
The definition in terms of one real-number parameter can be extended to one real-vector parameter
A family of distributions is said to belong to a vector exponential family if the probability density function (or probability mass function, for discrete distributions) can be written as
or in a more compact form,
This form writes the sum as a dot product of vector-valued functions and .
An alternative, equivalent form often seen is
As in the scalar valued case, the exponential family is said to be in canonical form if
A vector exponential family is said to be curved if the dimension of
is less than the dimension of the vector
That is, if the dimension, d, of the parameter vector is less than the number of functions, s, of the parameter vector in the above representation of the probability density function. Most common distributions in the exponential family are not curved, and many algorithms designed to work with any exponential family implicitly or explicitly assume that the distribution is not curved.
Just as in the case of a scalar-valued parameter, the function or equivalently is automatically determined by the normalization constraint, once the other functions have been chosen. Even if is not one-to-one, functions and can be defined by requiring that the distribution is normalized for each value of the natural parameter . This yields the canonical form
or equivalently
The above forms may sometimes be seen with in place of . These are exactly equivalent formulations, merely using different notation for the dot product.
The vector-parameter form over a single scalar-valued random variable can be trivially expanded to cover a joint distribution over a vector of random variables. The resulting distribution is simply the same as the above distribution for a scalar-valued random variable with each occurrence of the scalar x replaced by the vector
The dimensions k of the random variable need not match the dimension d of the parameter vector, nor (in the case of a curved exponential function) the dimension s of the natural parameter and sufficient statisticT(x) .
Suppose H is a non-decreasing function of a real variable. Then Lebesgue–Stieltjes integrals with respect to are integrals with respect to the reference measure of the exponential family generated by H .
Any member of that exponential family has cumulative distribution function
H(x) is a Lebesgue–Stieltjes integrator for the reference measure. When the reference measure is finite, it can be normalized and H is actually the cumulative distribution function of a probability distribution. If F is absolutely continuous with a density with respect to a reference measure (typically Lebesgue measure), one can write . In this case, H is also absolutely continuous and can be written so the formulas reduce to that of the previous paragraphs. If F is discrete, then H is a step function (with steps on the support of F).
Alternatively, we can write the probability measure directly as
In the definitions above, the functions T(x), η(θ), and A(η) were arbitrary. However, these functions have important interpretations in the resulting probability distribution.
T(x) is a sufficient statistic of the distribution. For exponential families, the sufficient statistic is a function of the data that holds all information the data x provides with regard to the unknown parameter values. This means that, for any data sets and , the likelihood ratio is the same, that is if T(x) = T(y) . This is true even if x and y are not equal to each other. The dimension of T(x) equals the number of parameters of θ and encompasses all of the information regarding the data related to the parameter θ. The sufficient statistic of a set of independent identically distributed data observations is simply the sum of individual sufficient statistics, and encapsulates all the information needed to describe the posterior distribution of the parameters, given the data (and hence to derive any desired estimate of the parameters). (This important property is discussed further below.)
η is called the natural parameter. The set of values of η for which the function is integrable is called the natural parameter space. It can be shown that the natural parameter space is always convex.
The function A is important in its own right, because the mean, variance and other moments of the sufficient statistic T(x) can be derived simply by differentiating A(η). For example, because log(x) is one of the components of the sufficient statistic of the gamma distribution, can be easily determined for this distribution using A(η). Technically, this is true because
Exponential families have a large number of properties that make them extremely useful for statistical analysis. In many cases, it can be shown that only exponential families have these properties. Examples:
The posterior predictive distribution of an exponential-family random variable with a conjugate prior can always be written in closed form (provided that the normalizing factor of the exponential-family distribution can itself be written in closed form).[c]
In the mean-field approximation in variational Bayes (used for approximating the posterior distribution in large Bayesian networks), the best approximating posterior distribution of an exponential-family node (a node is a random variable in the context of Bayesian networks) with a conjugate prior is in the same family as the node.[8]
Given an exponential family defined by , where is the parameter space, such that . Then
If has nonempty interior in , then given any IID samples, the statistic is a complete statistic for .[9][10]
is a minimal statistic for iff for all , and in the support of , if , then or .[11]
It is critical, when considering the examples in this section, to remember the discussion above about what it means to say that a "distribution" is an exponential family, and in particular to keep in mind that the set of parameters that are allowed to vary is critical in determining whether a "distribution" is or is not an exponential family.
Some distributions are exponential families only if some of their parameters are held fixed. The family of Pareto distributions with a fixed minimum bound xm form an exponential family. The families of binomial and multinomial distributions with fixed number of trials n but unknown probability parameter(s) are exponential families. The family of negative binomial distributions with fixed number of failures (a.k.a. stopping-time parameter) r is an exponential family. However, when any of the above-mentioned fixed parameters are allowed to vary, the resulting family is not an exponential family.
As mentioned above, as a general rule, the support of an exponential family must remain the same across all parameter settings in the family. This is why the above cases (e.g. binomial with varying number of trials, Pareto with varying minimum bound) are not exponential families — in all of the cases, the parameter in question affects the support (particularly, changing the minimum or maximum possible value). For similar reasons, neither the discrete uniform distribution nor continuous uniform distribution are exponential families as one or both bounds vary.
The Weibull distribution with fixed shape parameter k is an exponential family. Unlike in the previous examples, the shape parameter does not affect the support; the fact that allowing it to vary makes the Weibull non-exponential is due rather to the particular form of the Weibull's probability density function (k appears in the exponent of an exponent).
As a first example, consider a random variable distributed normally with unknown mean μ and known variance σ2. The probability density function is then
This is a single-parameter exponential family, as can be seen by setting
If σ = 1 this is in canonical form, as then η(μ) = μ.
Normal distribution: unknown mean and unknown variance
The following table shows how to rewrite a number of common distributions as exponential-family distributions with natural parameters. Refer to the flashcards[12] for main exponential families.
For a scalar variable and scalar parameter, the form is as follows:
For a scalar variable and vector parameter:
For a vector variable and vector parameter:
The above formulas choose the functional form of the exponential-family with a log-partition function . The reason for this is so that the moments of the sufficient statistics can be calculated easily, simply by differentiating this function. Alternative forms involve either parameterizing this function in terms of the normal parameter instead of the natural parameter, and/or using a factor outside of the exponential. The relation between the latter and the former is:
To convert between the representations involving the two types of parameter, use the formulas below for writing one type of parameter in terms of the other.