-
Notifications
You must be signed in to change notification settings - Fork 62
Percentile 1 method
The percentile
method is used to find a percentile value in the dataset. This algorithm is also used by PERCENTILE or PERCENTILE.INC functions in Microsoft Excel and PERCENTILE function in Google Docs Sheets. This method is the same as the 7th sample quantile method from the Hyndman and Fan paper (1996).
The function receives two parameters:
-
values
: array of values in the dataset. -
percentile
: percentile between 0 and 1 inclusive.
For example, suppose we want to calculate the 40th percentile for the following measurements: 35, 20, 50, 40, 15. We call the percentile
and pass the measurements and the percentile as decimal value 0.4.
Sigma.percentile([35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29
We use the following algorithm to calculate the percentile value:
Firstly, we sort the dataset from lowest to highest values.
sortedValues = [15, 20, 35, 40, 50]
Secondly, we find the rank of 40th percentile. Rank is the position of an element in the dataset. For example, rank 1 is the first element, 2 is the second. This value can be a decimal, for example, 3.35 will be used to find the value between the third and fourth elements.
Equation 1
rank = percentile * (count - 1) + 1
Where:
-
percentile
is the percentile argument of the function. -
count
is the size of the dataset which is equal to the size of the array passed as values argument.
We substitute the arguments into Equation 1:
rank = 0.4 * (5 - 1) = 2.6
Next we calculate the integer and fractional parts of the rank. The integer part of 2.6 is 2 and the fractional part of 2.6 is 0.6.
rankInteger = 2
rankFraction = 0.6
Then, we look at the ordered dataset {15, 20, 35, 40, 50} and find the element corresponding to the rankInteger
and rankInteger + 1
. In our example, rankInteger is 2, therefore, we need to find the second and third elements which are 20 and 35.
elementValue = 20
elementPlusOneValue = 35
Finally, we calculate the resulting percentile value by interpolating between elementValue
and elementPlusOneValue
values according to the rankFraction
.
percentileValue = elementValue + rankFraction * (elementPlusOneValue - elementValue)
percentileValue = 20 + 0.6 * (35 - 20) = 29
The algorithm is undefined and returns nil
in the following situations.
- The supplied
values
array is empty. - The supplied
percentile
value is negative or greater than 1.
- The
percentile
argument of 0 will return the minimum value in the dataset. - The
percentile
argument of 0.5 returns the median value. - The
percentile
argument of 1 returns the maximum value from the dataset.
The algorithm described above found the 40th percentile to be 29 for the set of measurements: 35, 20, 50, 40, 15.
- Percentiles from NIST/SEMATECH e-Handbook of Statistical Methods.
- Percentile Wikipedia article.