Skip to content

006. June 4 to June 8

aradu12 edited this page Jul 4, 2018 · 1 revision

Planned tasks for this week

  • task 1: mine more repos -- java and python ✔️

Progress

task 1

  • it just occurred to me that I missed an obvious case: searching for a keyword ex. "bug" will miss commits with different capitalization ex "Bug". Therefore, changed mining script to convert all commit msgs to lowercase first.

  • tried out another method of picking repos to mine: went to github and searched for commits containing "memory leak", "performance", etc., to find candidate repos, then mined those with PyDriller

    • pros: we know the repo has at least one true positive, and potentially more when examined fully with PyDriller
    • cons: github doesn't let you filter commits by language, so have to look for .java or .py files manually

Other

  • separated python data from java data on github
  • also cleaned up the repo a little by removing old irrelevant data files
  • added a list of all repos that have been examined on Github 📋
  • added a working list of commits that are interesting but not related to this project here
  • current tally of TOTAL problems documented = 78 (as of June 8)

Open Problems

  • interesting situation that keeps coming up is that some repos have lots of stars but very few commits (<30); these repos don't give many hits

Things we discussed/agreed on

  • continue looking for data for this week; then try methods suggested in paper
  • document patterns and recurring themes

Next steps

  • use GumTree API to look for patterns in code revisions