soccerdata-scraper scrapes soccer data from Wikipedia across tier 1 European Football Leagues and makes interactive as well as interesting data visualizations from it.
Current available leagues for scraping and then visualizations are given below.
League | Seasons | Source |
English Premier League | 1992-93 to present | https://en.wikipedia.org/wiki/Category:Premier_League_seasons |
Spanish La Liga | 1929-30 to present | https://en.wikipedia.org/wiki/Category:La_Liga_seasons |
Italian Serie A | 1929-30 to present | https://en.wikipedia.org/wiki/Category:Serie_A_seasons |
German Bundesliga | 1963-64 to present | https://en.wikipedia.org/wiki/Category:Bundesliga_seasons |
Install the dependencies listed below manually or use requirements.txt
pip install -r requirements.txt
List of libraries apart from standard ones that are required to make soccerdata-scraper work correctly. Use of Python 3.7.x or higher and most recently available stable builds for libraries is recommended.
Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
Requests is an elegant and simple HTTP library for Python, built for human beings.
NumPy is the fundamental package for array computing with Python.
Powerful data structures for data analysis, time series, and statistics
An open-source, interactive graphing library for Python
GUI toolkit for embedding a Chromium widget in desktop applications
Python Imaging Library
After making sure all dependencies are installed correctly, execute main.py. If everything's right, a graphical interface window should pop up.
A new window should open up which contains interactive visualizations for selected season's data. Click on sub headings in this window to expand them and view the respective visualizations inside them. All generated graphs can be interacted within this window. A complete sample interactive visualization report which was shown here, can be can be seen here.
Also all the visualization reports generated are stored in a html file and can be interacted again through a web browser or if only some visualizations are required, they are also stored separately in a html file and can be retrieved individually. Along with this all the scraped data is further parsed into a JSON file and stored, should you only need the data and not visualizations.
A new folder called dumps should appear in soccerdata-scraper directory or whatever you have named current directory. Its contents will be something like this.
All three folders will contains 4 sub folders one for each league.
Contents of graphs folder look something like this, after selecting a league.
After selecting the respective season folder, individual visualizations can be interacted with.
Contents of json folder after selecting a league look something like this. All the data used for visualization can be obtained from this files.
reports folder contains the all complete season wise interactive visualization reports for each league, as seen through our interface. It's contents after selecting a league should look something like this.
While this has been extensively tested, some specific visualizations for some seasons might fail due to page changes or some other reasons. Even in such possibility, visualizations should still work for whatsoever data that was scraped and parsed without any issues.