-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an <epidemic>
class
#159
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the right approach to solve the reproducibility issue. This is not something that should be addressed at the package-level, but rather in trainings, such as the one Andree produced on research compendia.
This gives users a false sense of security and could incentivize them to compare outputs in situations where they should not be compared. If the time frame between two simulations is such that the package has been updated, it is much safer to encourage users to re-run all simulations, as other elements in the whole stack may have changed.
Additionally, this leads to a more complex return object. This PR moves away from the nice simple data.frame, which is immediately clear to e.g., excel users, to have a more complex list. The change in the output type is also is step back in terms of interoperability IMO since many generic data science packages work directly with data.frames.
As stated on slack, there could be a case for including some metadata in the output if this greatly facilitates the downstream analysis via / the integration with scenarios. But this is a very slippery slope that comes at the cost of increased complexity and a steeper learning curve for the package. Doing it for reproducibility reasons is not correct IMO. As such:
- the included info should be limited to the bare minimum. I don't think the package version belongs here
- this needs to be synced with a more precise design plan for scenarios
|
||
# get package version or hash | ||
# NOTE: is NULL when installed from local version | ||
hash <- utils::packageDescription("epidemics")[["RemoteSha"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't exist for CRAN packages. Having worked on R packages for reproducibility in the past, I can say with certainty this whole topic is much more complex than one might expect, and full of edge cases.
Thanks - this is still a draft and there will be an oppotunity to discuss this among other things in the upcoming P3 development meeting. I'll update this PR to reflect any decisions we make. |
Thanks for the feedback @Bisaloo - this was discussed in the (just concluded) meeting with @rozeggo, @BlackEdder, @TimTaylor and @bahadzie. In summary we have decided against the structure of A discussion of, and decisions about, the new structure of |
We have decided against including any versioning and agree that environment management is a user task.
I agree and think users should be responsible for parameter management. Keeping in mind that not all users might be able to do this successfully, we have decided to structure model outputs as a class with encapsulated parameters, but hopefully one that can be handled by methods for data.frames - possibly a nested data.frame, data.table, or tibble. Feedback on this is welcome.
I agree - but see above that we have decided that we should better facilitate parameter management and scenario comparability for users. Again any feedback on having/the eventual structure of |
This draft PR fixes #156 by adding an
<epidemic>
class. This is scheduled to be merged after #158.<epidemic>
is a list-based class with four elements: the function name, the model data, the model inputs (parameters and composable elements), and the version hash for reproducibility or comparability.Changes:
<epidemic>
as output;<epidemic>
(e.g.epidemic_size()
);get_parameter()
supports<epidemic>
and allows accessing the model data ("data") and the parameters and composable inputs ("parameters");<epidemic>
outputs.