Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.int32 vs np.float32 observations #6

Open
ronTohar1 opened this issue Apr 29, 2023 · 3 comments
Open

np.int32 vs np.float32 observations #6

ronTohar1 opened this issue Apr 29, 2023 · 3 comments

Comments

@ronTohar1
Copy link

hey, I noticed there is a problem in battleships.py where the observation space is of type int and the returned observation in reset or step are of type np.float32.
This causes a problem with env_checker of gym of course and therefore also with the use of RL libraries such as SB3 as they use this information and it creates conflicts.

@thomashirtz
Copy link
Owner

I just committed some changes, please let me know if it has resolved the issue, if not, please give me a minimal minimal reproducible example of the issue :)

@ronTohar1
Copy link
Author

It did fix my problem, but I did have to do some changes to the step function (regardless of the current new commit) to make the code work.

I changed the step() function
from

def step(self, raw_action: Union[int, tuple]) -> Tuple[np.ndarray, int, bool, dict]:
        if isinstance(raw_action, int):

To

def step(self, raw_action: Union[int, tuple]) -> Tuple[np.ndarray, int, bool, dict]:
        if isinstance(raw_action, int) or isinstance(raw_action, np.int32) or isinstance(raw_action, np.int64):

This change had been made way before the aforementioned problem arose.

For example, the gym environment checker didn't pass before the change to the step() function, as well as the following piece of code:

import gymnasium
import gym_battleship

env = gymnasium.make("GymV21Environment-v0", env_id="Battleship-v0")
obs,_ = env.reset()
done = False
while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
    env.render() 

This happened because for some reason, sample() of the action apce gave an numpy.int64 value. (This error is also the reason the env_checker didn't 'approve' the environment)

P.S I just can't understand why you'd put an np.float32 in the dtype as it is only integers, but I guess it doesn't make any difference.

@thomashirtz
Copy link
Owner

Ok, I'll take a closer look for fixing the code when I have time

Please let me know if there are any other thing not so right that you notice in the code :)

For the observation space, I remember that there were a clear reason why I did that and I was quite upset to have to switch to float instead of int. However I can't remember now. I may switch back to int and see if someone experience an issue.

Thank you for your message :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants