Added extraction of url_pdf from right hand side [PDF] link. #95

pmdscully · 2017-04-26T18:13:12Z

This change will extract the [PDF] href value from the right hand side of a Google Scholar article entry. It will record the URL as url_pdf if the article's url_pdf hasn't already been filled and Google scholar labels the link as a PDF (i.e. the element's text is [PDF]).

Test: scholar.py -c 10 --txt --author "einstein" --phrase "quantum"

Pre-change: 0/4 PDF links extracted
Post-change: 4/4 PDF links extracted

As far as I am aware Google Scholar's [PDF] label is the best, easily available indicator of whether the (optional) right hand side anchor refers to a PDF file.

Added extraction of url_pdf from right hand side [PDF] link.

282cecb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added extraction of url_pdf from right hand side [PDF] link. #95

Added extraction of url_pdf from right hand side [PDF] link. #95

pmdscully commented Apr 26, 2017

Added extraction of url_pdf from right hand side [PDF] link. #95

Are you sure you want to change the base?

Added extraction of url_pdf from right hand side [PDF] link. #95

Conversation

pmdscully commented Apr 26, 2017