Skip to content

DataManagementSystem

Ricky Concepcion edited this page Apr 20, 2020 · 2 revisions

Data Management System (DMS)

Data management systems (DMS) are part of a family of classes derived from a DataManagementSystem class. The purpose of these classes is to cache NumPy arrays (data) in memory to reduce unnecessary repeated loading, parsing, and processing of data files. Essentially, it acts like a librarian that the program goes through to access the data it needs to create an Optimizer instance. It works by associating NumPy arrays with lookup keys.

Features

The DMS is designed with the idea of allocating a specified amount of memory to reserve for keeping processed NumPy arrays available for retrieval. Because loading large csv/Excel files and parsing them takes onerous amounts of time while the desired data is often fractions of a percent in size, the idea was to keep the desired data in memory should it be required again. For example, if you want to perform a sensitivity study on how the maximum state of charge of the energy storage device affects the net revenue generated in a single month, you are going to use the same exact data for each optimization run, just different model parameters. Therefore, the overhead of repeatedly loading from csv files can be completely avoided if the processed data is preserved.

The DMS stores data in an OrderedDict, a Python data type that is a dictionary in which the order of insertion is known.

Memory management

When initialized, the DMS has a max_memory attribute. Any time a data entry is added to the DMS, the total amount of memory that the DMS occupies is estimated, using the nbytes attribute of NumPy arrays. It is possible to reassign a value to max_memory if desired.

NOTE: The dictionary overhead is not accounted for when computing memory consumption.

If adding a new entry exceeds this amount, the DMS will purge data by "popping" it from the OrderedDict.

Temporal locality

Like a cache, the DMS is managed using the principle of temporal locality. Whenever data is added or retrieved, it is sent to the bottom of the pile. This means the data that has not been created or accessed in the longest time is at the top of the pile, next to be popped when a purge occurs.

Accessing data

Data is accessed through a class method, get_data, that takes keys as positional arguments. Data is stored in nested OrderedDict objects. For example, if dms is an instance of the DataManagementSystem class,

dms.get_data('08 2016.xlsx', 'hazle')

is functionally equivalent to:

dms.data['08 2016.xlsx']['hazle'] # <1>

Using get_data, rather than directly accessing the OrderedDict, ensures that the management functionality of the class is utilized.

Limitations

A limitation of the design of the DMS is that it currently only supports a nested dictionary of depth=2. In other words, only two keys may be provided. Providing more keys does not raise any exceptions, but will not have the desired behavior.

The suggested manner for encoding more complex keys is to encode multiple identifiers into a single key by combining them with a delimiter. A delimiter is to these identifiers as a space is to a sequence of words.

A delimiter can be stored as an attribute of the DMS instance:

def __init__(self, home_path, **kwargs):
    DataManagementSystem.__init__(self, **kwargs)

    self.home_path = home_path
    self.delimiter = ' @ '

This delimiter may be leveraged to create dictionary keys that encode more information. As an example:

def get_load_profile_data(self, path, month):
    """Retrieves commercial or residential load profile data."""
    logging.info('DMS: Loading load profile data')

    month = str(month)
    load_profile_key = self.delimiter.join([path, month])

    try:
        load_profile = self.get_data(load_profile_key)
    except KeyError:
        load_profile = read_load_profile(path, month)
        self.add_data(load_profile, load_profile_key)
    finally:
        return load_profile

This example comes from the BtmDMS for QuESt BTM. This class method is used to obtain load profile data from the DMS instance. The load_profile_key is created by combining the path of the load profile object and the requested month of data. These two pieces of information are joined using the string method join called by the delimiter string. This is equivalent to:

"<path> @ <month>"

since in this example, the delimiter is " @ ". The rest of this class method implements the most common access pattern for DMS methods:

  1. Try to find the data associated with the key using the DMS get_data() method.
  2. If no such data exists, read it from the file specified by the key.
  3. Add the data to the DMS instance using the key.
  4. Finally, return the requested data.