Missing Data Constraints #11

jan-gerling · 2020-03-19T14:45:23Z

As of today, we have only very few constraints for the data in our database. A data constraint is an "assertion" over the data, e.g. the process metrics of a refactoring have to be higher or equal for later refactorings on the same file.
We do simple sanity checks in the Integration tests, especially the toy-projects, but the stress tests (#146 95) and canary tests showed that we missed many (edge) cases.

Advantages:

confidence in the data

For more inspiration look here: https://fontysblogt.nl/testing-machine-learning-applications/

mauricioaniche · 2020-03-19T15:52:13Z

I think checking the constraints is something we can do in the ML pipeline. For example, whenever we apply a transformation, we make sure the dataset is still as we want it to be!

I'm adding the label here.

jan-gerling transferred this issue from refactoring-ai/predicting-refactoring-ml Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Data Constraints #11

Missing Data Constraints #11

jan-gerling commented Mar 19, 2020

mauricioaniche commented Mar 19, 2020

Missing Data Constraints #11

Missing Data Constraints #11

Comments

jan-gerling commented Mar 19, 2020

Advantages:

mauricioaniche commented Mar 19, 2020