USOSweb Automated - with Python!

https://www.travis-ci.com/mkochanowski/USOSweb-automated.svg?token=mjTA3RTxEXwwcJqa4ige&branch=master

1 The project
2 Getting started
3 Basic configuration
4 Extending the functionality
- 4.1 Scraping
- 4.2 Notifications
  - 4.2.1 Writing message templates
  - 4.2.2 Implementing additional Streams (channels)
5 API Reference

1 The project

This package provides functionality to automate tasks on the USOSweb Interface.

It also adds support for the most needed and requested feature of all time - notifications!

Setup takes only 5 minutes and extending the script's functionality is a child's play.

The app uses Selenium for navigating the interface and BeautifulSoup4 for parsing in the ScrapingTemplates.

2 Getting started

A good place to start is to clone the repository:

git clone https://github.com/mkochanowski/USOSweb-automated.git

Inside the project's root directory create a new virtual environment, then activate it:

python3 -m venv venv

# to activate on Linux:
source venv/bin/activate

# to activate on Windows:
.\venv\Scripts\activate

Now you can safely install required packages:

pip install -r requirements.txt

For automating the browser, install Chrome Driver.

You can skip this step if you already utilize a different driver, such as Ghost Driver or Edge Driver.

Learn more about configuring web drivers in the documentation.

Done! Time for some configuration.

3 Basic configuration

3.1 The .env file

Your app will not execute without a properly configured .env file.

This project comes with a .env.sample to help you get started. You only need to introduce minor changes.

The file's contents are:

USOS_SETTINGS_USERNAME=""
USOS_SETTINGS_PASSWORD=""

USOS_SCRAPER_ROOT_URL="https://usosweb.uni.wroc.pl/kontroler.php?_action="
USOS_SCRAPER_DESTINATIONS="dla_stud/studia/oceny/index dla_stud/studia/sprawdziany/index"
USOS_SCRAPER_MINIMUM_DELAY=4
USOS_SCRAPER_WEBDRIVER_HEADLESS=False
USOS_SCRAPER_DEBUG_MODE=True

USOS_NOTIFICATIONS_ENABLE=True
USOS_NOTIFICATIONS_STREAMS="Email WebPush SMS"
USOS_NOTIFICATIONS_CONFIG_FILE="notifications_config.json"

Name of the setting	Description	Default value
`USOS_SETTINGS_USERNAME`	Credentials neeeded for the process of authentication on the USOSweb interface.	Empty strings
`USOS_SETTINGS_PASSWORD`		Empty strings
`USOS_SCRAPER_ROOT_URL`	A root url of the USOSweb application. The default root url includes a GET parameter `action` because it is used throughout the interface U might think of it as a representation of a structure similiar to `http://usosweb.app/action/`.	A root url for the University of Wroclaw
`USOS_SCRAPER_DESTINATIONS`	Predefined actions (destinations) that will be visited by the scraper after calling the run() method.	Final grades and course results
`USOS_SCRAPER_MINIMUM_DELAY`	Minimum delay between individual executions of the `app.py` main script. Do not exploit the services you're using because you might get in trouble!	4 minutes (don't go any lower)
`USOS_SCRAPER_WEBDRIVER_HEADLESS`	Whether to run the web driver in headless mode (in other words: silently, without the browser window appearing). You might want to disable it for debugging or developing new interactions.	`False`
`USOS_SCRAPER_DEBUG_MODE`	Whether to run the application in debug mode that provides more additional logging statements. Enable it only on your local development environement to avoid collecting unnnecessary data.	`True`
`USOS_NOTIFICATIONS_ENABLE`	Whether to allow the dispatcher to send any notifications via configured channels.	`True`
`USOS_NOTIFICATIONS_STREAMS`	Streams (channels) are user-configurable medias for delivering the notifications such as Email, Text messages or direct WebPush notifications to your browser.	Email and other examples
`USOS_NOTIFICATIONS_CONFIG_FILE`	Path to the configuration file responsible for providing necessary variables such as API Keys or special parameters to individual channels. Utilizing a separate source for config data will allow you to design streams that are much more flexible.	A file provided with a project.

Input the credentials and the root url of the USOSweb app you want to access and you're good to go!

To execute the app, run:

python3 app.py

3.2 Receiving notifications

This script supports dispatching notifications via multiple channels, but Email is the one implemented by default. Initially, it comes with yagmail preinstalled, but you're free to replace it with a different library if needed.

To use yagmail you will need to configure OAuth2: Configuring yagmail. You can place the oauth2_creds.json file in the root directory of your project.

Lastly, update the notifications_config.json with the recipient and sender email addresses.

You can now send notifications via email!

3.3 Monitoring for changes

When running on a server, remember to set USOS_SCRAPER_DEBUG_MODE=False and USOS_SCRAPER_WEBDRIVER_HEADLESS=True in the .env file.

Now that you made sure the app is configured and fully working, let's deploy it to our server.

There are different ways of doing that, the most basic one would be to replicate the steps in Getting started guide and copy the configuration files from your local machine.
Let's set up a script that will execute the app inside of the virtual environment.

It may look like this:
```
#!/bin/bash
cd /home/username/USOSweb-automated
source venv/bin/activate
python3 app.py
```
Replace the path with the directory you installed the script in and save the file as cron.sh.
The last step is to add the script to the crontab.

Open the crontab by running:
```
crontab -e
```
And add the script:
```
*/10 * * * * /home/username/USOSweb-automated/cron.sh
```
That means the cron.sh script will be executed every 10 minutes.
Congratulations! Your project is fully set up.

4 Extending the functionality

4.1 Scraping

4.1.1 Writing ScrapingTemplates

A ScrapingTemplate is a set of rules that is predefined for a specific page.

Let's say the url we want to scrape is:

https://usosweb.uni.wroc.pl/kontroler.php?_action=dla_stud/studia/sprawdziany/pokaz&wez_id=33693

In this example, a ROOT_URL is https://usosweb.uni.wroc.pl/kontroler.php?_action= and the destination: dla_stud/studia/sprawdziany/pokaz.

The path of the template is going to be templates/scraping/dla_stud-studia-sprawdziany-pokaz.py (just replace the slashes with dashes).

This is how a minimal template looks like:

import logging
from bs4 import BeautifulSoup

logging = logging.getLogger(__name__)


class ScrapingTemplate:
    """Scrapes the specific type of page by using predefined
    set of actions."""
    def __init__(self, web_driver: object) -> None:
        self.driver = web_driver
        self.results = None

    def get_data(self) -> object:
        """Returns the scraped and parsed data."""
        self._parse(soup=self._soup())

        logging.debug(self.results)
        return self.results

    def _soup(self) -> object:
        """Generates a soup object out of a specific element
        provided by the web driver."""
        driver_html = self.driver.find_element_by_id("container")

        soup = BeautifulSoup(
            driver_html.get_attribute("innerHTML"),
            "html.parser")

        return soup

    def _parse(self, soup: object) -> None:
        """Initializes parsing of the innerHTML."""
        parser = Parser(soup=soup, web_driver=self.driver)
        self.results = {
            "module": __name__,
            "parsed_results": parser.get_parsed_results()
        }

class Parser:
    """Parses the provided HTML with BeautifulSoup."""
    def __init__(self, web_driver: object, soup: object) -> None:
        self.soup = soup
        self.driver = web_driver
        self.results = []

    def get_parsed_results(self) -> list:
        """Returns the results back to the ScrapingTemplate."""

        ... # does parsing magic

        return self.results

The only requirement for the ScrapingTemplate is to implement the get_data() method so that it returns a dictionary with a module key, such as:

{
    "module": __name__,
    "new_destinations": [ ... ],
    "parsed_results": [ ... ]
}

Available keys:

new_destinations - URLs to pass back to the scraper for building up the queue of crawling.

parsed_results - data saved in a form of a list of entities.

4.1.2 Using custom web drivers

By default, the Scraper class uses ChromeDriver to automate the browser.

You can add more drivers in usos/web_driver.py. Here is an example of a custom driver:

def _driver_phantomjs(self) -> None:
    """Adds PhantomJS WebDriver support."""
    logging.info("Creating new PhantomJS Driver")

    dir_path = os.path.dirname(os.path.realpath(__file__))
    driver_path = dir_path + '/phantomjs'
    driver = webdriver.PhantomJS(executable_path=driver_path)
    driver.set_window_size(1120, 550)

    self._driver = driver

Important: your method should set the self._driver attribute to point to the instance of the driver.

Now, you can add additional logic to how the drivers are chosen.

Let's say, we want the PhantomJS driver to launch only in debug mode, and ChromeDriver on our production server.

def get_instance(self) -> object:
    """Returns an instance of the selected web driver."""
    self.reset()

    if self.config["MY_DEBUG_MODE"]:
         self._driver_phantomjs()
    else:
         self._driver_chrome()

    return self._driver

4.1.3 Defining new entities

The current implementation of an Entity will be replaced in the future by an independant data structure.

Honestly, operating on dictionaries instead of a dedicated class feels a little weird for such an important element.

An Entity is a dictionary structure that contains two keys: entity and items.

Think about it not only as a data type, but also as an abstraction that defines its purpose.

For example:

{
    "entity": "course-results-tree",
    "items": [
        {
            "group": "28-INF-S-DOLI",
            "subgroup": "Logic for Computer Science",
            "hierarchy": "Exams",
            "item": "Final Exam",
            "values": ["85.0 pts", "Editor: John Doe"]
        }, {
            "group": "28-INF-S-DOLI",
            "subgroup": "Logic for Computer Science",
            "hierarchy": "Class/Tests",
            "item": "Test no. 3",
            "values": ["15.0 pts", "Editor: Jane Doe"]
        }
    ]
}

Entity course-results-tree defines not only what it stores in the items key, but also how to process the data - the defined behaviour is to compare the supplied items with existing data to search for changes.

If you want to introduce a new entity, start with a ScrapingTemplate. This is the very first step of a lifecycle of an entity.
Add custom behaviour for the specific entity you're implementing. Check and if needed, expand methods _get_filename() and analyze() of the usos.data.DataController class.
Update your rendering templates to support this type of entity.
Great! You now have a new type of entity that supports custom behaviour.

4.2 Notifications

4.2.1 Writing message templates

This package comes with Jinja2 as a default templating engine.

Everybody loves Jinja2. That's why it is used as a default templating engine for this project.

You can add your own templates by putting them into the templates/notifications/ directory.

To learn more about writing templates in Jinja2, check out the documentation.

4.2.2 Implementing additional Streams (channels)

Streams are defined in usos/notifications.py. To add your own channel, just subclass Notification and implement two private methods: _render() and _send().

The Dispatcher class automatically sets the self.data and self.config attributes that supply results from the DataController as well as channel-specific key variables from notifications_config.json file.

The final template should be saved in the self._rendered_template attribute.

def _render(self) -> None:
    env = Environment(loader=FileSystemLoader('templates/notifications'))
    template = env.get_template('WebRequest.html')

    self._rendered_template = template.render(data=self.data)

Your _send() method should return a boolean indicating whether the notification has been sent successfuly or not.

def _send(self) -> bool:
    data = {
        'API_KEY': self.config["API_KEY"],
        'MESSAGE': self._rendered_template
    }
    request = requests.post(API_URL, data=data)
    return (request.status_code == 200)

Here's another example of a custom stream: PaperMail.

class PaperMail(Notification):
    def _render(self) -> None:
        letter: str = "Hey, {name}! "
                        + "{message} "
                        + "Take care, {author}."

        letter = letter.format(
            name=data["recipient"],
            message=data["message"],
            author=data["sender"])

        self._rendered_template = letter

    def _send(self) -> bool:
        put_in_a_mailbox(self._rendered_template)
        return True

Now it can be used as a channel on it's own:

dispatcher = Dispatcher(
    channels="PaperMail",
    enable=True,
    config_file="mailbox_coordinates.json")

my_message = {
    "recipient": "Kate",
    "message": "I'm getting a divorce.",
    "sender": "Anthony"
}

dispatcher.send(my_message)

5 API Reference

Visit https://docs.kochanow.ski/usos/api.html to get more information.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
docs		docs
templates		templates
tests		tests
usos		usos
.env.sample		.env.sample
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.rst		README.rst
app.py		app.py
cron.sh.sample		cron.sh.sample
logging.yaml		logging.yaml
notifications_config.json		notifications_config.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USOSweb Automated - with Python!

1 The project

2 Getting started

3 Basic configuration

3.1 The .env file

3.2 Receiving notifications

3.3 Monitoring for changes

4 Extending the functionality

4.1 Scraping

4.1.1 Writing ScrapingTemplates

4.1.2 Using custom web drivers

4.1.3 Defining new entities

4.2 Notifications

4.2.1 Writing message templates

4.2.2 Implementing additional Streams (channels)

5 API Reference

About

Releases

Packages

Languages

License

mkochanowski/USOSweb-automated

Folders and files

Latest commit

History

Repository files navigation

USOSweb Automated - with Python!

About

Topics

Resources

License

Stars

Watchers

Forks

Languages