arXiv_to_grAph

a project to visualize metadata from https://arxiv.org/. Just for the fun of it.

we can get xml data:

2.5 years later, another attempt.

There are interesting projects out there by now, like:

unclear if interesting:

on a Mac with default python 2, do:

pip3 install virtualenv
virtualenv -p python3 env
source env/bin/activate
python3

This will create one big (to giant, depending on your query) json document. To have one json line = one paper, do e.g.

cat arxiv_cat_math_SG_AND_au_eliashberg_20190814-172937.json | jq -c '.[]'  > arxiv_cat_math_SG_AND_au_eliashberg_20190814-172937_lines.json

From here, pump it into elasticsearch, or spark, or whatever your choice of json store might be.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data_xml		data_xml
spark		spark
.gitignore		.gitignore
README.md		README.md
arxiv_cat_math_SG_AND_au_cieliebak_20190815-222017.json		arxiv_cat_math_SG_AND_au_cieliebak_20190815-222017.json
arxiv_sample_doc.json		arxiv_sample_doc.json
arxivdb_to_json.py		arxivdb_to_json.py
fetch_papers_json.py		fetch_papers_json.py
requirements.txt		requirements.txt

Provide feedback