Building a large graph database with Python and Neo4j.
The MS Movie Graph is a large Neo4j database including some of the largest resources available on the Internet concerning the Movies domain. Since we cannot re-distribute these resources, we provide the code to assemble the database once these are obtained from the official sources. This version cross-references IMDB, Wikidata, Inspired and Movielens.
To get a local copy up and running follow these simple steps.
- Clone the repo
git clone https://github.com/antori82/MS_MovieGraph.git
- Download the IMDB data at https://www.imdb.com/interfaces/ and put them in a folder named "IMDB"
- Download the Movielens dataset and put it in a folder named "Movielens"
- Download the Inspired dataset and put in in a folder named "Inspired"
- Run the tsvProcess.py script to pre-process the IMDB data
- Set up the Neo4j connection variables and run the CreateDatabase.py script to import the IMDB data
- Run the ImportAwards.py script to import data concering awards from Wikidata
- Run the Import MovielensRatings.py script to import Movielens data
- Run the ImportWikiNames.py script to import, from Wikidata, alternative names of the movies rated in Movielens
- Run the Inspired_neo4j.py script to import Inspired in the database
- Run the ConnectDatasets.py script to create references from Inspired to the knowledge domain graph
Distributed under the MIT License. See LICENSE
for more information.