-
Notifications
You must be signed in to change notification settings - Fork 62
Percentile 1 method
The percentile
method is used to find a percentile value in the dataset. This algorithm is also used by PERCENTILE or PERCENTILE.INC functions in Microsoft Excel and PERCENTILE function in Google Docs Sheets.
The function receives two parameters:
-
values
: array of values in the dataset. -
percentile
: percentile between 0 and 1 inclusive.
For example, suppose we want to calculate the 40th percentile for the following measurements: 35, 20, 50, 40, 15. We call the percentile
and pass the measurements and the percentile as decimal value 0.4.
Sigma.percentile([35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29
We use the following algorithm to calculate the percentile value:
Firstly, we sort the dataset from lowest to highest values.
sortedValues = [15, 20, 35, 40, 50]
Secondly, we find the rank of 40th percentile. Rank is the position of an element in the dataset. For example, rank 1 is the first element, 2 is the second. This value can be a decimal, for example, 3.35 will be used to find the value between the third and fourth elements.
Equation 1
rank = percentile * (count - 1) + 1
Where:
-
percentile
is the percentile argument of the function. -
count
is the size of the dataset which is equal to the size of the array passed as values argument.
We substitute the arguments into Equation 1:
rank = 0.4 * (5 - 1) = 2.6
Next we calculate the integer and fractional parts of the rank. The integer part of 2.6 is 2 and the fractional part of 2.6 is 0.6.
rankInteger = 2
rankFraction = 0.6
Then, we look at the ordered dataset {15, 20, 35, 40, 50} and find the element corresponding to the rankInteger
and rankInteger + 1
. In our example, rankInteger is 2, therefore, we need to find the second and third elements which are 20 and 35.
elementValue = 20
elementPlusOneValue = 35
Finally, we calculate the resulting percentile value by interpolating between elementValue
and elementPlusOneValue
values according to the rankFraction
.
percentileValue = elementValue + rankFraction * (elementPlusOneValue - elementValue)
percentileValue = 20 + 0.6 * (35 - 20) = 29
The algorithm is undefined and returns nil
in the following situations.
- The supplied
values
array is empty. - The supplied
percentile
value is negative or greater than 1.
- The
percentile
argument of 0 will return the minimum value in the dataset. - The
percentile
argument of 0.5 returns the median value. - The
percentile
argument of 1 returns the maximum value from the dataset.
The algorithm described above found the 40th percentile to be 29 for the set of measurements: 35, 20, 50, 40, 15.
- Percentiles from NIST/SEMATECH e-Handbook of Statistical Methods.
- Percentile Wikipedia article.