Full text search for pinboard.in pins
You should have docker and docker-compose installed. We built using python 3, but things may work on previous versions. You should be using virtualenv anyway.
To install scrapy python project dependencies:
$ pip install -r requirements.txt
$ scrapy crawl --logfile=data/spider.log --loglevel=INFO -o data/data.json -t json -a user=[PINBOARD_USERNAME] -a after=[NUMBER] pinboard
Where PINBOARD_USERNAME is the user name you are registered and NUMBER is from which timestamp you want to fetch pinboard links (use 1 to start from oldest).
After scraping data and adding it to data folder, start solr container with:
$ docker-compose up -d
Solr will start and precreate a core named 'pinboard' as default.
Also ./data
folder will be available inside the container at /var/data
folder.
If you want to explore the container, login with:
$ docker exec -ti pinboogle_solr_1 /bin/bash
You can also access the admin site at http://{DOCKER_HOST}:8983/solr
With Solr container running, use the following command to create all the fields needed for Pinboard json we got from scraping task:
$ docker exec -ti pinboogle_solr_1 /var/data/schema_migration.sh
And the next command to import the json file:
$ docker exec -it --user=solr pinboogle_solr_1 bin/post -c pinboard /var/data/[JSON_FILE]
Search interface is implemented with Flask and it's located on ./web
folder.
When you run docker-compose up you will see that another container will be built. To access it, go to your browser at:
http://{DOCKER_HOST}:5000
Everything that you change on web
folder will be reflected on container, if you change or add any dependency, rebuild the container with:
$ docker-compose down
$ docker-compose up -d --build
We built this to play with scrapy, solr, python and docker. Pinboard has a paid subscription if you want to have full text search on your links, if you want high quality results, subscribe to it.