Skip to content

Project Description

snadi edited this page Apr 20, 2018 · 1 revision

Project Overview

The goal of the project is to gather a data set of bugs related to non-functional requirements, such as security, performance, or memory use. To do so, we will mine the commit history of open-source projects to identify commits that seem to be fixing/improving any of these non-functional requirements. Once a suitable commit is identified, we want to store the following:

  • a minimum version of what the code looked like before the fix
  • a minimum version of what the code looks like after the fix
  • a description of the problem and the line number it occurs in

This data set alone is very valuable and can be published as a data paper in MSR for example. Ideally, we should come up with a taxonomy that can describe the problems we found.

The data set can be used in the following ways (some of which we may try out depending on time and progress):

  • code recommender systems: imagine you are working in your IDE and type up a code snippet that the recommender knows has performance issues. The recommender can then tell you "Hey, your code is correct but there's a faster way to do this"
  • explaining differences between code snippets: if you try to search for a solution to a task on StackOverflow, you often get several ways to accomplish the task. It is often not clear what the differences may be so if we have the above data set, we can try matching the StackOverflow snippets to our data and label them with known problems or explain the differences/trade-offs to the user
  • they can be used as a benchmark API misuse detectors to see if they can identify the problems