Skip to content

WorldsEndless/gc-to-orgmode-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

gc-scraper

2024 September edition. A quick-and-dirty webscraper for LDS General Conference, using Pandoc to convert to orgmode files.

Prerequisites

You will need Leiningen 2.0.0 or above installed.

You will also need Pandoc which is available free and also through linux repos. Pandoc performs the format conversion from HTML.

The output will be emacs’ orgmode, which is included in emacs and supported by pandoc. Orgmode syntax functions like a fuller version of markdown (org is older, and not inherently compatible, although exporting to markdown is possible).

Running

Right now this can only be run from the REPL, like:

gc-scraper.general-conference> (get-web-gc "my/slash/ending/output/directory/"

STARTED Improvements [2/4]

  • [ ] Clean up handling of references (footnotes)
  • [X] More robust output-dir handling
  • [ ] Commandline Operability
  • [X] Allow specifying of different conference years or urls

License

Copyright © 2008-2024000 Tory S. Anderson with the GPL v3 License. Use and modification is fine; attribution appreciated.

About

Scrape LDS General Conference and convert to orgmode

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published