Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data Constraints #11

Open
jan-gerling opened this issue Mar 19, 2020 · 1 comment
Open

Missing Data Constraints #11

jan-gerling opened this issue Mar 19, 2020 · 1 comment

Comments

@jan-gerling
Copy link
Contributor

As of today, we have only very few constraints for the data in our database. A data constraint is an "assertion" over the data, e.g. the process metrics of a refactoring have to be higher or equal for later refactorings on the same file.
We do simple sanity checks in the Integration tests, especially the toy-projects, but the stress tests (#146 95) and canary tests showed that we missed many (edge) cases.

Advantages:

  1. confidence in the data

For more inspiration look here: https://fontysblogt.nl/testing-machine-learning-applications/

@mauricioaniche
Copy link
Contributor

I think checking the constraints is something we can do in the ML pipeline. For example, whenever we apply a transformation, we make sure the dataset is still as we want it to be!

I'm adding the label here.

@jan-gerling jan-gerling transferred this issue from refactoring-ai/predicting-refactoring-ml Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants