Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Inverse CDF of distributions #262

Open
ghost opened this issue Apr 24, 2023 · 5 comments
Open

[Feature Request] Inverse CDF of distributions #262

ghost opened this issue Apr 24, 2023 · 5 comments

Comments

@ghost
Copy link

ghost commented Apr 24, 2023

Is your feature request related to a problem? Please describe.
Inverse CDFs are useful for calculating credible intervals for a given distribution, among other things.

Describe the solution you'd like
It would be great to have inverse CDFs for all distributions. But starting from normal distribution would be great.

@bvenn
Copy link
Member

bvenn commented Apr 24, 2023

The normal InvCDF for mean = 0 and sigma = 1 is already implemented at an inproper position:

let inverseCDF x =
sqrt 2. * Errorfunction.inverf (2. * x - 1.)

I agree, we should add quantile functions as InvCDF for all distributions 👍

@bvenn
Copy link
Member

bvenn commented Apr 25, 2023

I've added an InvCDF member to all distributions by 3d6a220.
I noticed the approximation of the inverse error function leads to some discrepancies when extreme values are chosen and compared to the R qnorm procedure.

// Testing FSharp.Stats
(Distributions.Continuous.Normal.InvCDF 0. 1. 0.5)  //0.
# Testing R
qnorm(0.5,0,1)
# Testing Python
from scipy.stats import norm
norm.ppf(0.5, loc=0, scale=1)
Mean StDev X result FSharp.Stats result R result Python
0 1 0.5 1.253321755e-09 0 0
0 1 0 -infinity -infinity -infinity
0 1 1 infinity infinity infinity
3 0.01 0.01 2.97673652985179 2.97673652126 2.97673652125959
-300000 5000 0.99 -288368.2649258 -288368.2606298 -288368.2606297958

While the deviation is small and just occurs at extreme values, it would be worth checking if the approximation presented in Wichura, “Algorithm AS 241: The Percentage Points of the Normal Distribution.”, 1988 should be implemented.

  • add tests
  • check accuracy

References

bvenn pushed a commit that referenced this issue Apr 30, 2023
bvenn pushed a commit that referenced this issue Apr 30, 2023
@bvenn
Copy link
Member

bvenn commented May 2, 2023

I've implemented the quantile function of the normal distribution as described in Wichura et al.. Its accurate for
15 decimal places.

DoganCK pushed a commit to DoganCK/FSharp.Stats that referenced this issue Jun 12, 2023
DoganCK pushed a commit to DoganCK/FSharp.Stats that referenced this issue Jun 12, 2023
@DoganCK DoganCK mentioned this issue Jun 12, 2023
2 tasks
@bvenn
Copy link
Member

bvenn commented Jul 4, 2023

Progress of adding InvCDF/quantile functions t continuous distributions

  • Normal
  • Gamma
  • Beta
  • F
  • StudentT
  • Studentized Range
  • LogNormal
  • MultivariateNrmal
  • Exponential
  • Uniform <- should be easy
  • Chi
  • ChiSquared

@bvenn
Copy link
Member

bvenn commented Jul 20, 2023

Many distributions have no closed form of the quantile function. Besides published approximations would be beneficial to add a member for each Distribution that approximates the correct x for a given p. The CDF is continuously increasing and therefore a root finding approach should work just fine. I propose the following:

type MyDistribution =

    static member PDF a b x = ...

    static member CDF a b x = ...

    static member InvCDF a b x = //possible no closed form exists

    static member InvCDFApprox a b x accuracy = 
        ///parameters: function (float -> float); accuracy (float); minimum (float); maximum (float); maxIterations
        let tmp = Optimization.Bisection.tryFindRoot (fun x -> MyDistribution.CDF a b x - p) accuracy 0. 1. 1000
        match tmp with 
        | Some x -> x
        | None -> failwith "no InvCDF found to satisfy the given conditions"

Drawbacks

  • While this should be feasible for any distribution, the optimization step may be quite slow.
  • If the CDF itself is an approximation, an error propagation would inflate the InvCDF error.

To discuss:

  • should the InvCDFApprox fail or result in nan when no root can be identified?
  • can the maxIterations be determined by accuracy? In my understanding, the range between min and max is divided into two sections during each iteration. Therefore, the accuracy should be coupled to the number of maximum iterations by: $$accuracy = 0.5^{maxIterations}$$ If this is correct, maxIterations could be set to $$System.Math.Log(accuracy,0.5)$$
  • should the accuracy be given as float, or maybe model it as type like in Expecto.Accuracy.
  • Is there a better alternative for Optimization.Bisection? Like e.g. NelderMead with (fun x -> abs (MyDistribution.CDF a b x - p)) as objective function.
  • Are there any better naming options as InvCDFApprox?
  • Are there more concerns, that I missed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant