Table of Contents
Benson accepts a list of URLs via database query or from a filename passed by command line argument and turns the web pages' content into individual audio files (currently, mp3s for portability).
I love to keep up with the latest articles and blog posts, but who has the time to relax and just catch up on reading anymore? What finally worked for me was to batch up the URLs of articles and blog posts I wanted to read and convert them to mp3 files that I listen to at 2x speed while I'm driving the car, walking the dog, or doing the dishes. Benson is the latest iteration of how I turn my reading backlog into convenient and quickly consumed audio files.
If you find Benson useful to you, by all means clone or fork it and customize to your heart's content. Pull requests are also welcome if you want to help me improve and expand the project.
As you might've guessed, I picked the name Benson after the sarcastic yet affable butler portrayed by Robert Guillaume in the 1980s sitcom Benson. It's all part of the fun tradition of naming software after butlers. (Jenkins, Alfred, Belvedere, anyone?)
- Trafilatura by Adrien Barbaresi for content extraction
- pyttsx3 by Natesh Bhat for text-to-speech
- ffmpeg-python by Karl Kroening for mp3 details
Thanks, y'all!
- ffmpeg needs to be in your system path
- Clone the repo.
git clone https://github.com/timoteostewart/benson.git
- Prepare and activate a venv (the Windows method is shown).
python -m venv benson_env ./benson_env/Scripts/Activate.ps1
- Ensure pip requirements are installed.
pip install -r requirements.txt
- Start turning URLs into audio files immediately. (See screenshot below.)
python benson.py --source test-urls.txt
- Listen to the four mp3s in the ./mp3_files folder to hear how it sounds. (Of course
pyttsx3
offers plenty of ways to customize the voices.)
- If URL is not currently available or scrapable, check for snapshots on archive.is, Wayback Machine, and similar.
- Implement progress indicator with estimated time of completion (useful for very large lists of URLs)
- Populate the ID3 fields in the mp3 to the extent possible
- For domain pronunciations not in domains_pronunciations.txt, try scraping the URL of the article and try to find a human-readable string that resembles the components of the domain. For example, it would be ideal for Benson to visit "https://avanwyk.com/" and determine that "avanwyk.com" could be spoken aloud as "Andrich van Wyk dot com". Obligatory xkcd
Contributions are welcome! Even small changes and refactors. Fork the repo and create a pull request. You may also open an issue with the tag "enhancement". Feel free to ⭐ the project too. Thanks!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request so I can check it out
Distributed under the MIT License. See LICENSE
file for more information.
Tim Stewart - tim@texastim.dev
Benson project link: https://github.com/timoteostewart/benson