Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an <epidemic> class #159

Closed
wants to merge 32 commits into from
Closed

Add an <epidemic> class #159

wants to merge 32 commits into from

Conversation

pratikunterwegs
Copy link
Collaborator

@pratikunterwegs pratikunterwegs commented Jan 29, 2024

This draft PR fixes #156 by adding an <epidemic> class. This is scheduled to be merged after #158.

<epidemic> is a list-based class with four elements: the function name, the model data, the model inputs (parameters and composable elements), and the version hash for reproducibility or comparability.

Changes:

  1. All model functions now return <epidemic> as output;
  2. All helper functions previously operating on data.frame model outputs now operate on <epidemic> (e.g. epidemic_size());
  3. get_parameter() supports <epidemic> and allows accessing the model data ("data") and the parameters and composable inputs ("parameters");
  4. All vignettes, examples, and tests now return, expect, and use <epidemic> outputs.

@pratikunterwegs pratikunterwegs self-assigned this Jan 30, 2024
Copy link
Member

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the right approach to solve the reproducibility issue. This is not something that should be addressed at the package-level, but rather in trainings, such as the one Andree produced on research compendia.
This gives users a false sense of security and could incentivize them to compare outputs in situations where they should not be compared. If the time frame between two simulations is such that the package has been updated, it is much safer to encourage users to re-run all simulations, as other elements in the whole stack may have changed.

Additionally, this leads to a more complex return object. This PR moves away from the nice simple data.frame, which is immediately clear to e.g., excel users, to have a more complex list. The change in the output type is also is step back in terms of interoperability IMO since many generic data science packages work directly with data.frames.

As stated on slack, there could be a case for including some metadata in the output if this greatly facilitates the downstream analysis via / the integration with scenarios. But this is a very slippery slope that comes at the cost of increased complexity and a steeper learning curve for the package. Doing it for reproducibility reasons is not correct IMO. As such:

  • the included info should be limited to the bare minimum. I don't think the package version belongs here
  • this needs to be synced with a more precise design plan for scenarios


# get package version or hash
# NOTE: is NULL when installed from local version
hash <- utils::packageDescription("epidemics")[["RemoteSha"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't exist for CRAN packages. Having worked on R packages for reproducibility in the past, I can say with certainty this whole topic is much more complex than one might expect, and full of edge cases.

@pratikunterwegs
Copy link
Collaborator Author

Thanks - this is still a draft and there will be an oppotunity to discuss this among other things in the upcoming P3 development meeting. I'll update this PR to reflect any decisions we make.

@pratikunterwegs
Copy link
Collaborator Author

Thanks for the feedback @Bisaloo - this was discussed in the (just concluded) meeting with @rozeggo, @BlackEdder, @TimTaylor and @bahadzie.

In summary we have decided against the structure of <epidemic> shown in this draft, but an <epidemic> class is likely to be implemented in the foreseeable future.

A discussion of, and decisions about, the new structure of <epidemic> and {epidemics} are available as a link on Slack - any feedback there is welcome. I will open issues based on meeting decisions by mid-day tomorrow, leaving time for any feedback (and I can edit them based on any feedback given later too).

@pratikunterwegs
Copy link
Collaborator Author

pratikunterwegs commented Jan 31, 2024

I don't think it's the right approach to solve the reproducibility issue. This is not something that should be addressed at the package-level, but rather in trainings, such as the one Andree produced on research compendia.
This gives users a false sense of security and could incentivize them to compare outputs in situations where they should not be compared. If the time frame between two simulations is such that the package has been updated, it is much safer to encourage users to re-run all simulations, as other elements in the whole stack may have changed.

We have decided against including any versioning and agree that environment management is a user task.

Additionally, this leads to a more complex return object. This PR moves away from the nice simple data.frame, which is immediately clear to e.g., excel users, to have a more complex list. The change in the output type is also is step back in terms of interoperability IMO since many generic data science packages work directly with data.frames.

I agree and think users should be responsible for parameter management. Keeping in mind that not all users might be able to do this successfully, we have decided to structure model outputs as a class with encapsulated parameters, but hopefully one that can be handled by methods for data.frames - possibly a nested data.frame, data.table, or tibble. Feedback on this is welcome.

As stated on slack, there could be a case for including some metadata in the output if this greatly facilitates the downstream analysis via / the integration with scenarios. But this is a very slippery slope that comes at the cost of increased complexity and a steeper learning curve for the package. Doing it for reproducibility reasons is not correct IMO. As such:

I agree - but see above that we have decided that we should better facilitate parameter management and scenario comparability for users. Again any feedback on having/the eventual structure of <epidemic> are welcome.

@pratikunterwegs pratikunterwegs deleted the feature/epidemic-class branch February 13, 2024 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add <epidemic> class
3 participants