-
Notifications
You must be signed in to change notification settings - Fork 40
DataManagementSystem
Data management systems (DMS) are part of a family of classes derived from a DataManagementSystem class. The purpose of these classes is to cache NumPy arrays (data) in memory to reduce unnecessary repeated loading, parsing, and processing of data files. Essentially, it acts like a librarian that the program goes through to access the data it needs to create an Optimizer instance. It works by associating NumPy arrays with lookup keys.
The DMS is designed with the idea of allocating a specified amount of memory to reserve for keeping processed NumPy arrays available for retrieval. Because loading large csv/Excel files and parsing them takes onerous amounts of time while the desired data is often fractions of a percent in size, the idea was to keep the desired data in memory should it be required again. For example, if you want to perform a sensitivity study on how the maximum state of charge of the energy storage device affects the net revenue generated in a single month, you are going to use the same exact data for each optimization run, just different model parameters. Therefore, the overhead of repeatedly loading from csv files can be completely avoided if the processed data is preserved.
The DMS stores data in an OrderedDict, a Python data type that is a dictionary in which the order of insertion is known.
When initialized, the DMS has a max_memory
attribute. Any time a data entry is added to the DMS, the total amount of memory that the DMS occupies is estimated, using the nbytes
attribute of NumPy arrays. It is possible to reassign a value to max_memory
if desired.
NOTE: The dictionary overhead is not accounted for when computing memory consumption.
If adding a new entry exceeds this amount, the DMS will purge data by "popping" it from the OrderedDict.
Like a cache, the DMS is managed using the principle of temporal locality. Whenever data is added or retrieved, it is sent to the bottom of the pile. This means the data that has not been created or accessed in the longest time is at the top of the pile, next to be popped when a purge occurs.
Data is accessed through a class method, get_data
, that takes keys as positional arguments. Data is stored in nested OrderedDict objects. For example, if dms
is an instance of the DataManagementSystem class,
dms.get_data('08 2016.xlsx', 'hazle')
is functionally equivalent to:
dms.data['08 2016.xlsx']['hazle'] # <1>
Using get_data
, rather than directly accessing the OrderedDict, ensures that the management functionality of the class is utilized.
A limitation of the design of the DMS is that it currently only supports a nested dictionary of depth=2. In other words, only two keys may be provided. Providing more keys does not raise any exceptions, but will not have the desired behavior.
The suggested manner for encoding more complex keys is to encode multiple identifiers into a single key by combining them with a delimiter. A delimiter is to these identifiers as a space is to a sequence of words.
A delimiter can be stored as an attribute of the DMS instance:
def __init__(self, home_path, **kwargs):
DataManagementSystem.__init__(self, **kwargs)
self.home_path = home_path
self.delimiter = ' @ '
This delimiter may be leveraged to create dictionary keys that encode more information. As an example:
def get_load_profile_data(self, path, month):
"""Retrieves commercial or residential load profile data."""
logging.info('DMS: Loading load profile data')
month = str(month)
load_profile_key = self.delimiter.join([path, month])
try:
load_profile = self.get_data(load_profile_key)
except KeyError:
load_profile = read_load_profile(path, month)
self.add_data(load_profile, load_profile_key)
finally:
return load_profile
This example comes from the BtmDMS for QuESt BTM. This class method is used to obtain load profile data from the DMS instance. The load_profile_key
is created by combining the path
of the load profile object and the requested month
of data. These two pieces of information are joined using the string method join
called by the delimiter string. This is equivalent to:
"<path> @ <month>"
since in this example, the delimiter is " @ ". The rest of this class method implements the most common access pattern for DMS methods:
- Try to find the data associated with the key using the DMS
get_data()
method. - If no such data exists, read it from the file specified by the key.
- Add the data to the DMS instance using the key.
- Finally, return the requested data.