Mean absolute difference

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference (not to be confused with the absolute value of the mean signed difference) and the Gini mean difference (GMD).^[1] The mean absolute difference is sometimes denoted by Δ or as MD.

Definition

The mean absolute difference is defined as the "average" or "mean", formally the expected value, of the absolute difference of two random variables X and Y independently and identically distributed with the same (unknown) distribution henceforth called Q.

\mathrm {MD} :=E[|X-Y|].

Calculation

Specifically, in the discrete case,

For a random sample of size n of a population distributed uniformly according to Q, by the law of total expectation the (empirical) mean absolute difference of the sequence of sample values y_i, i = 1 to n can be calculated as the arithmetic mean of the absolute value of all possible differences:

\mathrm {MD} =E[|X-Y|]=E_{X}[E_{Y|X}[|X-Y|]]={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}|x_{i}-y_{j}|.

if Q has a discrete probability function f(y), where y_i, i = 1 to n, are the values with nonzero probabilities:

\mathrm {MD} =\sum _{i=1}^{n}\sum _{j=1}^{n}f(y_{i})f(y_{j})|y_{i}-y_{j}|.

In the continuous case,

if Q has a probability density function f(x):

\mathrm {MD} =\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }f(x)\,f(y)\,|x-y|\,dx\,dy.

An alternative form of the equation is given by:

\mathrm {MD} =\int _{0}^{\infty }\int _{-\infty }^{\infty }2\,f(x)\,f(x+\delta )\,\delta \,dx\,d\delta .

if Q has a cumulative distribution function F(x) with quantile function Q(F), then, since f(x)=dF(x)/dx and Q(F(x))=x, it follows that:

\mathrm {MD} =\int _{0}^{1}\int _{0}^{1}|Q(F_{1})-Q(F_{2})|\,dF_{1}\,dF_{2}.

Relative mean absolute difference

When the probability distribution has a finite and nonzero arithmetic mean AM, the relative mean absolute difference, sometimes denoted by Δ or RMD, is defined by

\mathrm {RMD} ={\frac {\mathrm {MD} }{\mathrm {AM} }}.

The relative mean absolute difference quantifies the mean absolute difference in comparison to the size of the mean and is a dimensionless quantity. The relative mean absolute difference is equal to twice the Gini coefficient which is defined in terms of the Lorenz curve. This relationship gives complementary perspectives to both the relative mean absolute difference and the Gini coefficient, including alternative ways of calculating their values.

Properties

The mean absolute difference is invariant to translations and negation, and varies proportionally to positive scaling. That is to say, if X is a random variable and c is a constant:

MD(X + c) = MD(X),
MD(−X) = MD(X), and
MD(c X) = |c| MD(X).

The relative mean absolute difference is invariant to positive scaling, commutes with negation, and varies under translation in proportion to the ratio of the original and translated arithmetic means. That is to say, if X is a random variable and c is a constant:

RMD(X + c) = RMD(X) · mean(X)/(mean(X) + c) = RMD(X) / (1 + c / mean(X)) for c ≠ −mean(X),
RMD(−X) = −RMD(X), and
RMD(c X) = RMD(X) for c > 0.

If a random variable has a positive mean, then its relative mean absolute difference will always be greater than or equal to zero. If, additionally, the random variable can only take on values that are greater than or equal to zero, then its relative mean absolute difference will be less than 2.

Compared to standard deviation

The mean absolute difference is twice the L-scale (the second L-moment), while the standard deviation is the square root of the variance about the mean (the second conventional central moment). The differences between L-moments and conventional moments are first seen in comparing the mean absolute difference and the standard deviation (the first L-moment and first conventional moment are both the mean).

Both the standard deviation and the mean absolute difference measure dispersion—how spread out are the values of a population or the probabilities of a distribution. The mean absolute difference is not defined in terms of a specific measure of central tendency, whereas the standard deviation is defined in terms of the deviation from the arithmetic mean. Because the standard deviation squares its differences, it tends to give more weight to larger differences and less weight to smaller differences compared to the mean absolute difference. When the arithmetic mean is finite, the mean absolute difference will also be finite, even when the standard deviation is infinite. See the examples for some specific comparisons.

The recently introduced distance standard deviation plays similar role to the mean absolute difference but the distance standard deviation works with centered distances. See also E-statistics.

Sample estimators

For a random sample S from a random variable X, consisting of n values y_i, the statistic

\mathrm {MD} (S)={\frac {\sum _{i=1}^{n}\sum _{j=1}^{n}|y_{i}-y_{j}|}{n(n-1)}}

is a consistent and unbiased estimator of MD(X). The statistic:

\mathrm {RMD} (S)={\frac {\sum _{i=1}^{n}\sum _{j=1}^{n}|y_{i}-y_{j}|}{(n-1)\sum _{i=1}^{n}y_{i}}}

is a consistent estimator of RMD(X), but is not, in general, unbiased.

Confidence intervals for RMD(X) can be calculated using bootstrap sampling techniques.

There does not exist, in general, an unbiased estimator for RMD(X), in part because of the difficulty of finding an unbiased estimation for multiplying by the inverse of the mean. For example, even where the sample is known to be taken from a random variable X(p) for an unknown p, and $X (p) - 1$ has the Bernoulli distribution, so that $Pr(X (p) = 1) = 1 - p$ and $Pr(X (p) = 2) = p$ , then

RMD(X (p)) = 2 p (1 - p)/(1 + p)

.

But the expected value of any estimator R(S) of RMD(X(p)) will be of the form:^{[citation needed]}

\operatorname {E} (R(S))=\sum _{i=0}^{n}p^{i}(1-p)^{n-i}r_{i},

where the r _i are constants. So E(R(S)) can never equal RMD(X(p)) for all p between 0 and 1.

Examples

Examples of mean absolute difference and relative mean absolute difference
Distribution	Parameters	Mean	Standard deviation	Mean absolute difference	Relative mean absolute difference
Continuous uniform	$a=0;b=1$	$1/2=0.5$	${\frac {1}{\sqrt {12}}}\approx 0.2887$	${\frac {1}{3}}\approx 0.3333$	${\frac {2}{3}}\approx 0.6667$
Normal	$\mu =0$ ; $\sigma =1$	$0$	$1$	${\frac {2}{\sqrt {\pi }}}\approx 1.1284$	undefined
Exponential	$\lambda =1$	$1$	$1$	$1$	$1$
Pareto	$k>1$ ; $x_{m}=1$	${\frac {k}{k-1}}$	${\frac {1}{k-1}}\,{\sqrt {\frac {k}{k-2}}}{\text{ for }}k>2$	${\frac {2k}{(k-1)(2k-1)}}\,$	${\frac {2}{2k-1}}\,$
Gamma	$k$ ; $\theta$	$k\theta$	${\sqrt {k}}\,\theta$	${\frac {2\theta }{\mathrm {B} (0.5,k)}}\,$ †	${\frac {2}{k\mathrm {B} (0.5,k)}}\,$ †
Gamma	$k=1$ ; $\theta =1$	$1$	$1$	$1$	$1$
Gamma	$k=2$ ; $\theta =1$	$2$	${\sqrt {2}}\approx 1.4142$	$3/2=1.5$	$3/4=0.75$
Gamma	$k=3$ ; $\theta =1$	$3$	${\sqrt {3}}\approx 1.7321$	$15/8=1.875$	$5/8=0.625$
Gamma	$k=4$ ; $\theta =1$	$4$	$2$	$35/16=2.1875$	$35/64=0.546875$
Bernoulli	$0\leq p\leq 1$	$p$	${\sqrt {p(1-p)}}$	$2p(1-p)$	$2(1-p){\text{ for }}p>0$
Student's t, 2 d.f.	$\nu =2$	$0$	$\infty$	⁠ ${\frac {\pi }{\sqrt {2}}}\approx 2.2214$ ⁠	undefined

†

\mathrm {B} (x,y)

is the Beta function

References

^ Yitzhaki, Shlomo (2003). "Gini's Mean Difference: A Superior Measure of Variability for Non-Normal Distributions" (PDF). Metron International Journal of Statistics. 61 (2). Springer Verlag: 285–316.

Sources

Xu, Kuan (January 2004). "How Has the Literature on Gini's Index Evolved in the Past 80 Years?" (PDF). Department of Economics, Dalhousie University. Retrieved 2006-06-01. {{cite journal}}: Cite journal requires |journal= (help)
Gini, Corrado (1912). Variabilità e Mutabilità. Bologna: Tipografia di Paolo Cuppini. Bibcode:1912vamu.book.....G.
Gini, Corrado (1921). "Measurement of Inequality and Incomes". The Economic Journal. 31 (121): 124–126. doi:10.2307/2223319. JSTOR 2223319.
Chakravarty, S. R. (1990). Ethical Social Index Numbers. New York: Springer-Verlag.
Mills, Jeffrey A.; Zandvakili, Sourushe (1997). "Statistical Inference via Bootstrapping for Measures of Inequality". Journal of Applied Econometrics. 12 (2): 133–150. CiteSeerX 10.1.1.172.5003. doi:10.1002/(SICI)1099-1255(199703)12:2<133::AID-JAE433>3.0.CO;2-H.
Lomnicki, Z. A. (1952). "The Standard Error of Gini's Mean Difference". Annals of Mathematical Statistics. 23 (4): 635–637. doi:10.1214/aoms/1177729346.
Nair, U. S. (1936). "Standard Error of Gini's Mean Difference". Biometrika. 28 (3–4): 428–436. doi:10.1093/biomet/28.3-4.428.
Yitzhaki, Shlomo (2003). "Gini's Mean difference: a superior measure of variability for non-normal distributions" (PDF). Metron – International Journal of Statistics. 61: 285–316.

[1] Yitzhaki, Shlomo (2003). "Gini's Mean Difference: A Superior Measure of Variability for Non-Normal Distributions" (PDF). Metron International Journal of Statistics. 61 (2). Springer Verlag: 285–316.

[1]