-
Notifications
You must be signed in to change notification settings - Fork 62
Percentile 1 method
The percentile1
method is used to find a percentile value in the dataset. This method is used by PERCENTILE or PERCENTILE.INC functions in Microsoft Excel and PERCENTILE function in Google Docs Sheets.
The function receives two parameters:
-
values
: array of values in the dataset. -
percentile
: percentile value between 0 and 1 inclusive.
For example, suppose we want to calculate the 40th percentile for the following measurements: 35, 20, 50, 40, 15. We call the percentile1
and pass the measurements and the percentile as decimal value 0.4.
Sigma.percentile1(values: [35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29
We use the following algorithm to calculate the percentile value:
Firstly, we sort the dataset from lowest to highest values.
sorted_values = [15, 20, 35, 40, 50]
Secondly, we find the rank of 40th percentile. Rank is the position of an element in the dataset. For example, rank 1 is the first element, 2 is the second and N is the last element. This value can be a decimal. For example 3.35 will be used to find the value between the third and fourth elements.
Equation 1
rank = percentile * (N - 1) + 1
Where:
-
percentile
is the percentile argument of the function. -
N
is the size of the dataset which is equal to the size of the array passed as values argument.
We substitute the arguments into Equation 1:
rank = 0.4 * (5 - 1) = 2.6
Next we calculate the integer and fractional parts of the rank. The integer part of 2.6 is 2 and the fractional part of 2.6 is 0.6.
rank_integer = 2
rank_fraction = 0.6
Then, we look at the ordered dataset {15, 20, 35, 40, 50} and find the element corresponding to the rank_integer
and rank_integer + 1
. In our example, rank_integer is 2, therefore, we need to find the second and third elements which are 20 and 35.
element_value = 20
element_plus_one_value = 35
Finally, we calculate the resulting percentile value. We have found two values by interpolating between element_value
and element_plus_one_value
values according to the rank_fraction
.
percentile_value = element_value + rank_fraction * (element_plus_one_value - element_value)
percentile_value = 20 + 0.6 * (35 - 20) = 29
The algorithm is undefined and returns nil
in the following situations.
- The supplied
values
array is empty. - The supplied
percentile
value is less than 0 or greater than 1.
- The
percentile
argument of 0 will return the minimum value in the dataset. - The
percentile
argument of 0.5 returns the median value. - The
percentile
argument of 1 returns the maximum value from the dataset.
The algorithm described above found the 40th percentile to be 29 for the set of measurements: 35, 20, 50, 40, 15.
- Percentiles from NIST/SEMATECH e-Handbook of Statistical Methods.
- Percentile Wikipedia article.