Scrapping esport betting information from web site www.pinacle.com using Scrapy and Selenium.
Take note: script was created for educational purposes to demonstrate usage of scrapy Pipelines, LinkExtractors, "Rules", Generic Spiders, Items, xpath selectors.
So, what does this spider exactly doing(general algorithm):
- Gather links to betting pages for each esport event(using appropriate set of rules).
- Follow each extracted link and scrape esport data.
- Filter gathered data in the pipeline.
After all processes finished we will get information about each single esport event to come. But, we will not include events, that already passed(or in progress), and betting data for not primary events(such as betting on "first blood", "second map winner" etc). Also, event/game time will be converted to UTC format. (If you want include all events and keep original "site time" - comment code inside "pipelines.py" file or exclude pipelines in "setting.py").
Keys and description for each returning line of information:
- 'date' - date of the single event/game in timedate format converted to UTC time(or tried to);
- 'game' - name of the game(CS:GO, League of Legends, Dota 2 etc);
- 'player1' - name of the first participant(or team name, like: "Fnatic" or "Team Liquid" etc);
- 'player2' - name of the second participant;
- 'odds1' - bet rate on the first player(float value, like: 1.862);
- 'odds2' - bet rate on the second player(float value).
This script was written in Python 3.6(for scrapy 1.5) and tested on Windows machine. Before running it, you'll need to install:
- Scrapy (on Windows machine you'll need appropriate C++ SDK to run Twisted - check their docs);
- Selenium (with geckodriver for Windows machines);
- Firefox browser.
After installing all requirements - copy "Pinnacle" folder to your machine/device. Open "pipelines.py" file and set variable "TIME_DIFFERENCE" to your own value (if needed).
To run a spider - change your location in terminal to scrapy project folder and type:
scrapy crawl pinnacle
To save data to .json file(for example), type:
scrapy crawl pinnacle -o yourfile.json