Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow seed fix in Sampler #669

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Allow seed fix in Sampler #669

wants to merge 11 commits into from

Conversation

cdtn
Copy link
Contributor

@cdtn cdtn commented Aug 26, 2022

No description provided.

@@ -638,3 +653,32 @@ def cart_prod(*arrs):
"""
grids = np.meshgrid(*arrs, indexing='ij')
return np.stack(grids, axis=-1).reshape(-1, len(arrs))

def mix_samplers_seeds(left, right):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a staticmethod of base Sampler class

name = _get_method_by_alias(name, 'ss')
self.name = name
self.state = make_rng(seed)
self.distr = getattr(ss, self.name)(**kwargs)

def sample(self, size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rng should be allowed to be passed as optional argument

@@ -254,7 +254,7 @@ def __init__(self, domain=None, **kwargs):
self.n_updates = 0
self.additional = True
self.create_id_prefix = False
self.random_state = None
self.rng = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a chain of random_state variables and their SeedSequences in batchflow:

---Pipeline
---Dataset
------Batch
---------inbatch_parallel workers (threads / processes / for-items)

And Research is even one level above that.

Have you read this entire chain of properties before changing it?

@@ -638,3 +653,32 @@ def cart_prod(*arrs):
"""
grids = np.meshgrid(*arrs, indexing='ij')
return np.stack(grids, axis=-1).reshape(-1, len(arrs))

def mix_samplers_seeds(left, right):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are mixing seeds, which is wrong. If I would mix two samplers in the proposed way and generate one random number, it would be the same, irrespectable of how many times each of those two samplers was called before creating the mixture.

You need to mix entropies, and there is a well-established (and used in other places in batchflow) way to do so: np.random.SeedSequence

rng1, rng2
state1 = rng1.bit_generator.state['state']['state']
state2 = rng2.bit_generator.state['state']['state']
seed = np.random.SeedSequence([state1, state2])
rng = np.random.default_rng(seed)

While the difference between these two approaches is hard to come by in any realistic example, the latter is the official way to do so.

In either case, the current proposed way to seed the RNG in sampler would not work with batchflow+seismiQB ways to fix the randomization, and the only thing you need to actually fix the seed for make_locations(sampler) in seismiQB is the ability to pass custom rng into Sampler.sample call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants