-
Notifications
You must be signed in to change notification settings - Fork 0
006. June 4 to June 8
aradu12 edited this page Jul 4, 2018
·
1 revision
- task 1: mine more repos -- java and python ✔️
-
it just occurred to me that I missed an obvious case: searching for a keyword ex. "bug" will miss commits with different capitalization ex "Bug". Therefore, changed mining script to convert all commit msgs to lowercase first.
-
tried out another method of picking repos to mine: went to github and searched for commits containing "memory leak", "performance", etc., to find candidate repos, then mined those with PyDriller
- pros: we know the repo has at least one true positive, and potentially more when examined fully with PyDriller
- cons: github doesn't let you filter commits by language, so have to look for .java or .py files manually
- separated python data from java data on github
- also cleaned up the repo a little by removing old irrelevant data files
- added a list of all repos that have been examined on Github 📋
- added a working list of commits that are interesting but not related to this project here
- current tally of TOTAL problems documented = 78 (as of June 8)
- interesting situation that keeps coming up is that some repos have lots of stars but very few commits (<30); these repos don't give many hits
- continue looking for data for this week; then try methods suggested in paper
- document patterns and recurring themes
- use GumTree API to look for patterns in code revisions