Allow seed fix in `Sampler` #669

cdtn · 2022-08-26T15:07:39Z

No description provided.

…in Domain

Previous approach via xor lead to collisions for pairs of equal seeds

SergeyTsimfer · 2022-10-05T14:22:38Z

batchflow/sampler.py

@@ -638,3 +653,32 @@ def cart_prod(*arrs):
    """
    grids = np.meshgrid(*arrs, indexing='ij')
    return np.stack(grids, axis=-1).reshape(-1, len(arrs))
+
+def mix_samplers_seeds(left, right):


Should be a staticmethod of base Sampler class

SergeyTsimfer · 2022-10-05T14:25:27Z

batchflow/sampler.py

        name = _get_method_by_alias(name, 'ss')
        self.name = name
-        self.state = make_rng(seed)
        self.distr = getattr(ss, self.name)(**kwargs)

    def sample(self, size):


rng should be allowed to be passed as optional argument

SergeyTsimfer · 2022-10-05T14:27:59Z

batchflow/research/domain.py

@@ -254,7 +254,7 @@ def __init__(self, domain=None, **kwargs):
        self.n_updates = 0
        self.additional = True
        self.create_id_prefix = False
-        self.random_state = None
+        self.rng = None


There is a chain of random_state variables and their SeedSequences in batchflow:

---Pipeline ---Dataset ------Batch ---------inbatch_parallel workers (threads / processes / for-items)

And Research is even one level above that.

Have you read this entire chain of properties before changing it?

SergeyTsimfer · 2022-10-05T14:38:30Z

batchflow/sampler.py

@@ -638,3 +653,32 @@ def cart_prod(*arrs):
    """
    grids = np.meshgrid(*arrs, indexing='ij')
    return np.stack(grids, axis=-1).reshape(-1, len(arrs))
+
+def mix_samplers_seeds(left, right):


You are mixing seeds, which is wrong. If I would mix two samplers in the proposed way and generate one random number, it would be the same, irrespectable of how many times each of those two samplers was called before creating the mixture.

You need to mix entropies, and there is a well-established (and used in other places in batchflow) way to do so: np.random.SeedSequence

rng1, rng2 state1 = rng1.bit_generator.state['state']['state'] state2 = rng2.bit_generator.state['state']['state'] seed = np.random.SeedSequence([state1, state2]) rng = np.random.default_rng(seed)

While the difference between these two approaches is hard to come by in any realistic example, the latter is the official way to do so.

In either case, the current proposed way to seed the RNG in sampler would not work with batchflow+seismiQB ways to fix the randomization, and the only thing you need to actually fix the seed for make_locations(sampler) in seismiQB is the ability to pass custom rng into Sampler.sample call

cdtn added 2 commits August 25, 2022 16:53

Rename state to rng and move it to base class

9e3f4ed

Move random numbers generator from NumpySampler to Sampler

0323e71

cdtn requested review from akoryagin and SergeyTsimfer August 26, 2022 15:08

cdtn added 9 commits August 29, 2022 20:43

Combine seeds with xor instead of add

c150739

Remove default seed from ConstantSampler, rename random_state to rng …

b7503b7

…in Domain

Move seed mixing to base class completely, fix pylint

aeb5dc6

Fix bug

9aad366

Merge branch 'master' into sampler

222db30

Provide bases to parent sampler init explicitly

39f6316

Explicitly combine bases seeds in child samplers

81d70bd

Change two seeds mixing procedure in samplers

1821ce4

Previous approach via xor lead to collisions for pairs of equal seeds

Fix pylint

ff2b716

SergeyTsimfer requested changes Oct 5, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow seed fix in `Sampler` #669

Allow seed fix in `Sampler` #669

cdtn commented Aug 26, 2022

SergeyTsimfer Oct 5, 2022

SergeyTsimfer Oct 5, 2022

SergeyTsimfer Oct 5, 2022

SergeyTsimfer Oct 5, 2022

Allow seed fix in Sampler #669

Are you sure you want to change the base?

Allow seed fix in Sampler #669

Conversation

cdtn commented Aug 26, 2022

SergeyTsimfer Oct 5, 2022

Choose a reason for hiding this comment

SergeyTsimfer Oct 5, 2022

Choose a reason for hiding this comment

SergeyTsimfer Oct 5, 2022

Choose a reason for hiding this comment

SergeyTsimfer Oct 5, 2022

Choose a reason for hiding this comment

Allow seed fix in `Sampler` #669

Allow seed fix in `Sampler` #669