Currently uses Python and iPython Notebook, the Twitter Search API via requests.
Run the script and notebook server with these environment variables:
nb.sh
cat ../nb.sh export NEO4J_URL=bolt://localhost export NEO4J_USER=neo4j export NEO4J_PASSWORD=**** export TWITTER_BEARER='...' # export TWITTER_SEARCH='#neo4j' ipython notebook
-
Use Twitter search API
-
Control direction of ingest with
catchUp: False
→ backward in history using max_id,catchUp: True
→ newer tweets using since_id -
Optionally provide twitter search via env-param
-
Use idempotent Cypher statement to merge Tweets, Users, Tags
-
store in json files and then import those
-
save a "hash" of the query used with the tweet, so we can compute "maxId" for different queries
-
https://dev.twitter.com/rest/public/timelines (max_id and since_id explained)
-
Note watch out of maxId of retweeted / replied tweets, that can be much older
-
Neo4j & Twitter