Data Puller

This repository is a collection of script I've made to conveniently pull my personnal data from internet services I use the most.
The goal is to get everything about me in one place for futher analysis (data science with R, full text search with Elastic, ...).

Those scripts pull every bit of interesting data about you available from web services APIs into plain JSON files.

Currently supporting :

Pocket : unread, archived & favorites
Twitter : likes, tweets, retweets
Youtube : likes, favorites, history (via manual import & parsing)
Reddit : upvoted, saved
Github : stars

🏥 Have a look at The Data Detox Kit.

Run

# A specific puller (for setup or debug), e.g. twitter
node src/pullers/twitter_pull.js
# All puller at once
npm run start

# Stats
node src/stats.js
# Specific report
node src/reports/twitter_report.js
node src/reports/pocket_readnext.js

Setup

Run npm install
Provide your API Credentials via env variables or a ./config.json file (have a look at ./src/config_manager.js)
Go through the auth procedure of every configured puller by launching them separatly (with something like node ./src/pullers/pocket_pull.js)

More on this project

Youtube Restrictions

The watch history and the watch later playlist are not accessible through the Youtube API for privacy reasons.
To get arround this you can obtain a watch-history.html file via the Google Takeout page. Then, put this file in the drop_zone folder so it can be parsed by the youtube puller on the next run.
As for the watch later playlist, the Google Takeout export is already a JSON file.

Late 2019 update : the watch history is now available in JSON but still require pulling videos details.

Why this project, Are website data exports not enough ?

Website's export feature have shortcomings (late 2019) :

Pocket export is in html and does not differenciate favorite from other items
Github export does not include starred repos
Youtube export does not include any videos metadata like duration and category

As for Facebook, Reddit and Twitter, they're doing a great job so my scripts may be irrelevant.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
drop_zone		drop_zone
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.nvmrc		.nvmrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
screenshot_pull.png		screenshot_pull.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Puller

Run

Setup

More on this project

Youtube Restrictions

Why this project, Are website data exports not enough ?

About

Releases

Packages

Languages

License

austil/datapuller

Folders and files

Latest commit

History

Repository files navigation

Data Puller

Run

Setup

More on this project

Youtube Restrictions

Why this project, Are website data exports not enough ?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages