Skip to content

community-graph/github-import

Repository files navigation

Neo4j GitHub Import for Community Graph (and other uses)

Currently uses Python and iPython Notebook, the GitHub GraphQL API via requests.

Run the script and notebook server with these environment variables:

nb.sh
cat ../nb.sh
export NEO4J_URL=bolt://localhost
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=****
export GITHUB_TOKEN=****

ipython notebook

Approach

  • Use GitHub GraphQL API with pre-authorized URL to retrieve non-fork repositories that have neo4j in their name or description (or perhaps readme?)

  • Use idempotent Cypher statement to merge Repositories, Users, Issues, Pull Requests, Tags, Languages

  • page over the data until all is imported

https://platform.github.community/t/access-graphql-api-via-python-client/746/2
curl -H "Authorization: token $GITHUB_TOKEN" -X POST https://api.github.com/graphql -d '{"query": "query { viewer { login } }"}'
GITHUB_TOKEN  is a valid OAuth token https://developer.github.com/v3/oauth/
https://developer.github.com/v3/oauth_authorizations/#create-a-new-authorization
https://github.com/settings/tokens
https://platform.github.community/t/how-to-get-any-useful-out-of-a-search/285
https://platform.github.community/t/add-your-schema-requests/25/50
https://platform.github.community/t/get-repository-languages/570
https://facebook.github.io/relay/docs/graphql-connections.html

query {
  repository(owner:"neo4j",name:"neo4j") {
    id, name,
    owner {
      id, login
    }
  }
}
query {
  search(query:"neo4j",type:REPOSITORY) {
		repositoryCount
  }
}

https://developer.github.com/early-access/graphql/explorer/

query {
  search(query:"neo4j created:>2016-01-01",type:REPOSITORY,first:10) {
    nodes {
      ... on Repository {
				id,
        name,
        description
        createdAt,
        updatedAt,
        stargazers {totalCount},
        watchers {totalCount},
        pullRequests{totalCount},
        license,
        releases{totalCount},
        url,
        languages(first:100) {edges {node{name}}}
#        labels(first:100) {edges {node{name}}}
        mentionableUsers(first:100) {edges {node{name,login,company}}}
#        languages{nodes{name}},
#        labels(first:100){nodes{name}},
#        mentionableUsers(first:100){nodes{name,login,company}},
        primaryLanguage{name},
        homepageURL,
        issues {totalCount},
        owner {id,login}
      }
    }
  }
}

Data Model

Uses the GitHub part of this data model:

community graph

Queries

Latest 100 repositories by stars.

MATCH (n:Repository) WITH n ORDER BY n.created desc LIMIT 100
WITH n ORDER BY n.created DESC LIMIT 100
RETURN n.title, n.url, n.created, n.favorites, substring(n.description,0,250) as desc
ORDER BY n.favorites DESC
MATCH (n:Repository)
WITH n
ORDER BY apoc.date.parse(n.updated,'ms',"yyyy-MM-dd'T'HH:mm:ss'Z'") desc
LIMIT 100
MATCH (n)<-[:CREATED]-(user) WHERE NOT user.name IN ["neo4j", "neo4j-contrib"]
RETURN n.title, n.url, n.created, n.favorites, n.description, n.updated, user.name
ORDER BY n.favorites + n.watchers desc

TODO

  • store responses in json files and then import those

About

Import GitHub repositories, owners, members, activity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published