-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement KeyedDistribution and KeyedSampleable #1
Conversation
I feel like that would likely just return the same thing as |
To subtype Sampleable I believe you only have to implement |
Has it anything to do with using |
Yeah I think this means extracting the relevant elements by calling the keyed distribution like a function (as AxisKeys can do) julia> A = KeyedArray([0.1, 0.2, 0.3, 0.4, 0.5], :obj=>[:a, :b, :c, :d, :e])
julia> d = MvNormal(A)
julia> kd = KeyedDistribution(d)
julia> kd(obj=[:a, :c, :d]) # returns another KD marginalised on [a, c, d] |
I think we can open issues for the remaining tasks and in this PR nail down the Distributions API at least. What's your sense of the time/effort this would take? |
Co-authored-by: Glenn Moynihan <glenn.moynihan@invenialabs.co.uk>
I thought so, it's just that IndexedDistributions implemented the statistical functions for both, so I wonder if it's expected somewhere in our code. Also I believe we have to overload |
That makes sense, just note it only corresponds to marginalising for some distributions (I think in the exponential family e.g. normal, t distribution). With this syntax, we could have a |
I estimate at most 1 day spent on this. |
Just those specified in Distributions.jl docs
EDIT: the fix was to specify If I do ERROR: type KeyedDistribution has no field KeyedDistribution
Stacktrace:
[1] getproperty(::KeyedDistribution{Multivariate,Continuous,MvNormal{Float64,PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}}, ::Symbol) at ./Base.jl:33
[2] ==(::KeyedDistribution{Multivariate,Continuous,MvNormal{Float64,PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}}, ::KeyedDistribution{Multivariate,Continuous,MvNormal{Float64,PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}}) at /Users/bencottier/.julia/packages/KeyedDistributions/i1TQS/src/KeyedDistributions.jl:15
[3] top-level scope at REPL[36]:1 Whereas The fields and types are all equal. I assume it relates to the way |
Fixed by specifying the type of `F` and `S`
- Increase coverage - Separate Distribution-only methods
Codecov Report
@@ Coverage Diff @@
## main #1 +/- ##
=========================================
+ Coverage 0 98.00% +98.00%
=========================================
Files 0 1 +1
Lines 0 50 +50
=========================================
+ Hits 0 49 +49
- Misses 0 1 +1
Continue to review full report at Codecov.
|
Expected to fix Pkg bug
Had to move it out of the `struct`
Consistent with AxisKeys.jl and allows Matrix-variate Distributions
Also a bit of clean-up and styling
I misunderstood @ref, I guess it cannot reference external things.
@mcabbott We're interested in what you think of this package and this PR. It applies the idea of a |
I think this is missing a few methods from the If you wanna push these to another PR that's fine. Univariate
these are recommended
Multi-variate
Matrix-variate
|
Test no longer broken
key_lengths = map(length, keys) | ||
key_lengths == _size(d) || throw(ArgumentError( | ||
"lengths of key vectors $key_lengths must match " * | ||
"size of distribution $(_size(d))")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not be pirating size
, but "size" in this message may be misleading. I did it for the sake of generalising to one error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe remove reference to size
in the error message?
"Dimensions of key vectors $key_lengths must match the distribution $(_size(d))"))
unless you think referring to dimensions is even more confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of dimensions, because it's ambiguous whether it's the number of dimensions or the size on each dimension. But you know what, this is testing both of those things, so maybe it's just right.
@eval Distributions.$f(d::KeyedDistribution{<:Univariate}) = $f(distribution(d)) | ||
end | ||
|
||
# Needed to avoid method ambiguity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method ambiguity errors from the tests, from old code when I put the below functions in the first @eval
loop
MethodError: cdf(::KeyedDistribution{Univariate,Continuous,Normal{Float64}}, ::Int64) is ambiguous. Candidates:
cdf(d::Distribution{Univariate,Continuous}, x::Real) in Distributions at /Users/bencottier/.julia/packages/Distributions/cNe2C/src/univariates.jl:367
cdf(d::KeyedDistribution{var"#s12",S,D} where D<:Distribution{var"#s12",S} where S<:ValueSupport where var"#s12"<:Univariate, args...) in KeyedDistributions at /Users/bencottier/JuliaEnvs/KeyedDistributions/src/KeyedDistributions.jl:173
Possible fix, define
cdf(::KeyedDistribution{Univariate,Continuous,D} where D<:Distribution{Univariate,Continuous}, ::Real)
MethodError: insupport(::KeyedDistribution{Univariate,Continuous,Normal{Float64}}, ::Int64) is ambiguous. Candidates:
insupport(d::Union{Type{D}, D}, x::Real) where D<:Distribution{Univariate,Continuous} in Distributions at /Users/bencottier/.julia/packages/Distributions/cNe2C/src/univariates.jl:127
insupport(d::KeyedDistribution{var"#s12",S,D} where D<:Distribution{var"#s12",S} where S<:ValueSupport where var"#s12"<:Univariate, b) in KeyedDistributions at /Users/bencottier/JuliaEnvs/KeyedDistributions/src/KeyedDistributions.jl:179
Possible fix, define
insupport(::D, ::Real) where D<:(KeyedDistribution{Univariate,Continuous,D} where D<:Distribution{Univariate,Continuous})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because we haven't provided enough type information to the function signature to restrict it to using our method.
We've only specified that it needs to be Univariate
but Distributions provides the ValueSupport
. So when it sees KeyedDistribution{Univariate, Continuous}
it's conflicted between which version to use.
FWIW I don't understand why it still doesn't break outside the @eval
loop, but we can/should fix it with the following
Distributions.cdf(d::KeyedArray{F, S, <:Distribution{F, S}, x) where {F, S} = cdf(distribution(d), x)
same with insupport
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't know why it worked this way.
end | ||
|
||
@testset "Distributions types" begin | ||
@testset "Univariate" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note we only test on Continuous
distributions. We should probably test on Discrete
at some point.
#7
This proof of concept for KeyedDistributions extends the AxisKeys.jl ecosystem with
KeyedSampleable
andKeyedDistribution
types. The types thinly wrapSampleable
s with a vector of keys, corresponding to the variates of theSampleable
. This is analogous toKeyedArray
wrapping an array.Not implemented:
marginalize
method for certain distributions #4