You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of my bachelor thesis, I am exploring the application of multi-objective reinforcement learning (MORL). To avoid the tedious work of implementing an algorithm from scratch, I searched for libraries similar to Stable-Baselines3 and came across MORL-Baselines. Without any particular reason, I decided to start with PGMORL to familiarize myself with MORL. However, I have encountered a few issues:
Environment Resource Limitation:
Due to restricted licensing, I can only use one instance of the environment. Unfortunately, in the constructor of the PGMORL agent, a tmp_env object is created to extract environment information. This causes an issue when an existing env object is passed to the constructor, as the tmp_env creation process raises a simulation-related error (I am using a simulation program for the environment). Nevertheless, when env=None is passed instead the issue is resolved.
Incompatibility Between Training and Evaluation Environments:
The issue mentioned above also leads to another problem: the environments used for training and evaluation cannot share the same wrapper. Specifically, the training environment is wrapped with MOSyncVectorEnv, which causes the step function to fail during policy evaluation in the eval_mo function. I worked around this by adding the line env = env.envs[0], which I assume extracts the environment with the MORecordEpisodeStatistics wrapper. It would be more practical if the same environment instance could be seamlessly used for both training and evaluation.
Model Saving and Testing:
I am unable to test the different models at the end of training because I do not know how to save them. My understanding is that the training process produces multiple agents, each optimized for different objective weightings. To test these agents, I need to be able to save them. However, I could not find a save_model method similar to that in Stable-Baselines3. This might be a misunderstanding on my part, but I would greatly appreciate clarification on this.
Thank you in advance for your response and support!
Best Regards,
hamod-kh
The text was updated successfully, but these errors were encountered:
First of all, welcome to the community! Now to answer your points:
Ah, I see. The env and env_id kinda duplicate the information indeed. This is to stay compatible with algorithms that do not use vectorized envs, i.e., all other algos. Anyways, if you found a way it's all good. :)
There is currently no save model method for PGMORL. Implementing this can be a bit tricky as we would effectively need to store the Pareto archive of models (not just one model).
That being said, if you can only instantiae one environment, I would use a more sample efficient algorithm that PGMORL. I assume it is a continuous problem so I'd tend to advise for GPI-LS which has the save model feature implemented :).
Dear MORL-Baselines Maintainers,
As part of my bachelor thesis, I am exploring the application of multi-objective reinforcement learning (MORL). To avoid the tedious work of implementing an algorithm from scratch, I searched for libraries similar to Stable-Baselines3 and came across MORL-Baselines. Without any particular reason, I decided to start with PGMORL to familiarize myself with MORL. However, I have encountered a few issues:
Environment Resource Limitation:
Due to restricted licensing, I can only use one instance of the environment. Unfortunately, in the constructor of the PGMORL agent, a tmp_env object is created to extract environment information. This causes an issue when an existing env object is passed to the constructor, as the tmp_env creation process raises a simulation-related error (I am using a simulation program for the environment). Nevertheless, when env=None is passed instead the issue is resolved.
Incompatibility Between Training and Evaluation Environments:
The issue mentioned above also leads to another problem: the environments used for training and evaluation cannot share the same wrapper. Specifically, the training environment is wrapped with MOSyncVectorEnv, which causes the step function to fail during policy evaluation in the eval_mo function. I worked around this by adding the line env = env.envs[0], which I assume extracts the environment with the MORecordEpisodeStatistics wrapper. It would be more practical if the same environment instance could be seamlessly used for both training and evaluation.
Model Saving and Testing:
I am unable to test the different models at the end of training because I do not know how to save them. My understanding is that the training process produces multiple agents, each optimized for different objective weightings. To test these agents, I need to be able to save them. However, I could not find a save_model method similar to that in Stable-Baselines3. This might be a misunderstanding on my part, but I would greatly appreciate clarification on this.
Thank you in advance for your response and support!
Best Regards,
hamod-kh
The text was updated successfully, but these errors were encountered: