Skip to content

Percentile 1 method

Evgenii Neumerzhitckii edited this page Apr 14, 2016 · 53 revisions

The percentile1 method is used to find a percentile value in the dataset. This method is used by PERCENTILE or PERCENTILE.INC functions in Microsoft Excel and PERCENTILE function in Google Docs Sheets.

The function receives two parameters:

  • values: array of values in the dataset.
  • percentile: percentile value between 0 and 1 inclusive.

For example, suppose we want to calculate the 40th percentile for the following measurements: 35, 20, 50, 40, 15. We call the percentile1 and pass the measurements and the percentile as decimal value 0.4.

Sigma.percentile1(values: [35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29

We use the following algorithm to calculate the percentile value:

1) Sort the dataset

Firstly, we sort the dataset from lowest to highest values.

sorted_values = [15, 20, 35, 40, 50]

2) Find the rank

Secondly, we find the rank of 40th percentile. Rank is the position of an element in the dataset. For example, rank 1 is the first element, 2 is the second and N is the last element. This value can be a decimal. For example 3.35 will be used to find the value between the third and fourth elements.

Equation 1

rank = percentile * (N - 1) + 1    

Where:

  • percentile is the percentile argument of the function.
  • N is the size of the dataset which is equal to the size of the array passed as values argument.

We substitute the arguments into Equation 1:

rank = 0.4 * (5 - 1) = 2.6

3) Get the integer and fractional part of the rank

Next we calculate the integer and fractional parts of the rank. The integer part of 2.6 is 2 and the fractional part of 2.6 is 0.6.

rank_integer = 2
rank_fraction = 0.6

4) Find the element at rank

Then, we look at the ordered dataset {15, 20, 35, 40, 50} and find the element corresponding to the rank_integer and rank_integer + 1. In our example, rank_integer is 2, therefore, we need to find the second and third elements which are 20 and 35.

element_value = 20
element_plus_one_value = 35

4) Calculate the percentile value

Finally, we calculate the resulting percentile value. We have found two values by interpolating between element_value and element_plus_one_value values according to the rank_fraction.

percentile_value = element_value + rank_fraction * (element_plus_one_value - element_value)
percentile_value = 20 + 0.6 * (35 - 20) = 29

Edge cases

The algorithm is undefined and returns nil in the following situations.

  1. The supplied values array is empty.
  2. The supplied percentile value is less than 0 or greater than 1.

0th, 50th and 100th percentiles

  • The percentile argument of 0 will return the minimum value in the dataset.
  • The percentile argument of 0.5 returns the median value.
  • The percentile argument of 1 returns the maximum value from the dataset.

The result

The algorithm described above found the 40th percentile to be 29 for the set of measurements: 35, 20, 50, 40, 15.

References

Clone this wiki locally