Skip to content

005. May 28 to June 1

aradu12 edited this page Jul 4, 2018 · 1 revision

Planned tasks for this week

  • task 1: mine more repos using filtering and linking to issues ✔️
  • task 2: mine some python repos ✔️

Progress

task 1

  • linking commits to issues definitely gives more true positives
  • filtering to .java files and filtering out "typo", "NPE" reduced the number of false positives
  • stemming of keywords hasn't yet resulted significant improvements, but probably will in the long run
  • gathering data for 50 java repos with the most stars on github, which haven't already been mined

task 2

  • filtered to .py files and kept filtering for "typo", "NPE" to reduce false positives
  • have looked at 8 repos so far

Open Problems

  • interesting situation that keeps coming up is that some repos have lots of stars but very few commits (<30); these repos don't give many hits

Things we discussed/agreed on

-- from last week:

  • filtering commit messages to those affecting .java files and using stemming
  • using issues to find misuses
  • separating project-specific misuses from project-independent ones
  • adding API and 'correct usage' info to data
  • adding a general 'rule' message to data
  • removed the data from the 'plaid' repo that was irrelevant

Next steps

  • was thinking of adding a 'lang' description to data to clarify if it is a java or python project