Quantiles

Quantiles are points in a distribution that relate to the rank order of values in that distribution.

For a sample, you can find any quantile by sorting the sample. The middle value of the sorted sample (middle quantile, 50th percentile) is known as the median. The limits are the minimum and maximum values. Any other locations between these points can be described in terms of centiles/percentiles.

Centiles/percentiles are descriptions of quantiles relative to 100; so the 75th percentile (upper quartile) is 75% or three quarters of the way up an ascending list of sorted values of a sample. The 25th percentile (lower quartile) is one quarter of the way up this rank order.

Percentile rank is the proportion of values in a distribution that a particular value is greater than or equal to. For example, if a pupil is taller than or as tall as 79% of his classmates then the percentile rank of his height is 79, i.e. he is in the 79th percentile of heights in his class.

Definition

StatsDirect gives the option of two different methods for calculating quantiles; only the first method can be used with observation weights:-

Method 1: This is a common method that emulates the inverse of the empirical probability distribution with averaging where there are discontinuities (Hyndman and Fan, 1996), This is also the universal default method in Stata:-

Take a sorted vector of observations u(i=1 to n) with weights w(i = 1 to n) or weights each equal to one if the sample is unweighted.

Then find i = the first index point that is larger than pn, where p is the proportion of the quantile and n is the sample size.

Method 2: This is the conventional definition used when calculating confidence intervals for quantiles (Mood and Graybill, 1973), This is also the universal default method in SPSS and Minitab:-

- where p is a proportion, Q is the pth quantile (e.g. median is Q(0.5)), i is the order statistic, h is the fractional part of the order statistic (0 or 0.5), u is an observation from a sample after it has been ordered from smallest to largest value and n is the sample size.