Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in HamiltonianMonteCarlo when using gather in log_prob function #1837

Open
martin-wiebusch-thg opened this issue Aug 23, 2024 · 2 comments

Comments

@martin-wiebusch-thg
Copy link

I am trying to run a Hamiltonian MCMC on a target distribution whose implementation involves a call to tf.gather. The following code:

import tensorflow as tf
import tensorflow_probability as tfp


def sample_hmc(
    target_log_prob_fn,
    current_state,
    num_results=1000,
    num_burnin_steps=500,
    adaptation_frac=0.8,
    num_leapfrog_steps=3,
    step_size=1.,
):    
    hmc = tfp.mcmc.SimpleStepSizeAdaptation(
        tfp.mcmc.HamiltonianMonteCarlo(
            target_log_prob_fn=target_log_prob_fn,
            num_leapfrog_steps=num_leapfrog_steps,
            step_size=step_size,
        ),
        num_adaptation_steps=int(num_burnin_steps * adaptation_frac))

    @tf.function
    def run_chain():
        samples, is_accepted = tfp.mcmc.sample_chain(
            num_results=num_results,
            num_burnin_steps=num_burnin_steps,
            current_state=current_state,
            kernel=hmc,
            trace_fn=lambda _, pkr: pkr.inner_results.is_accepted,
        )
        return samples, is_accepted
        
    return run_chain()


def logprob(alpha):
    indices = tf.constant([2, 0, 1], dtype=tf.int32)
    return -tf.math.reduce_sum(tf.gather(alpha**2, indices))
    # return -tf.math.reduce_sum(alpha**2)


alpha = tf.constant([1.0, 1.0, 1.0])
sample_hmc(
    logprob,
    current_state=alpha,
    num_results=10,
    num_burnin_steps=5,
)

raises ValueError: The two structures don't have the same nested structure. followed by a very long and (to me) cryptic message. Replacing the return statement in the logprob function with the commented line gets rid of the error. The error seems to appear whenever the result of logprob contains a tf.gather subexpression.

The error also disappears when I remove the @tf.function decorator from the definition of run_chain. However, this comes at a huge performance cost.

How can I efficiently sample from a distribution whose log-probability involves a tf.gather expression?

@martin-wiebusch-thg
Copy link
Author

martin-wiebusch-thg commented Aug 27, 2024

The error occurs here:

File ~/.local/opt/miniconda/envs/adhoc/lib/python3.11/site-packages/tensorflow_probability/python/mcmc/internal/leapfrog_integrator.py:291, in SimpleLeapfrogIntegrator.call(self, momentum_parts, state_parts, target, target_grad_parts, kinetic_energy_fn, name)

an these seem to be the relevant parts of the message:

ValueError: in user code:
...
File "/root/.local/opt/miniconda/envs/adhoc/lib/python3.11/site-packages/tensorflow_probability/python/mcmc/internal/leapfrog_integrator.py", line 291, in call
] = tf.while_loop(

ValueError: The two structures don't have the same nested structure.

...
More specifically: Substructure "type=IndexedSlices str=IndexedSlices(indices=Tensor("mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init__/one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/gradients/mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init_/_one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/GatherV2_grad/Reshape_1:0", shape=(3,), dtype=int32), values=Tensor("mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init__/one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/gradients/mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init_/_one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/GatherV2_grad/Reshape:0", shape=(3,), dtype=float32), dense_shape=Tensor("mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init__/one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/gradients/mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init_/_one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/GatherV2_grad/Cast:0", shape=(1,), dtype=int32))" is a sequence, while substructure "type=SymbolicTensor str=Tensor("mcmc_sample_chain/trace_scan/while/smart_for_loop/while/simple_step_size_adaptation___init__/_one_step/mh_one_step/hmc_kernel_one_step/maybe_call_fn_and_grads/value_and_gradients/fn_grad:0", shape=(None,), dtype=float32)" is not
Entire first structure:
[., [.], [.], ., [.]]
Entire second structure:
[., [.], [.], ., [.]]

@jeffpollock9
Copy link
Contributor

@martin-wiebusch-thg I think this is due to gradients with tf.gather returning an IndexedSlices rather than the expected tensor, see e.g. https://www.tensorflow.org/api_docs/python/tf/IndexedSlices

It's possible to convert to tensor using tf.convert_to_tensor however you would need to edit the tfp.math.value_and_gradient function in the source (I think). An alternative could be using jax or avoiding tf.gather (see example):

import tensorflow as tf
import tensorflow_probability as tfp

i = tf.constant([0, 0, 1])

def value(x):
    return tf.reduce_sum(tf.gather(x, i))

def value_and_gradient(x):
    return tfp.math.value_and_gradient(value, x)


y = tf.constant([1.0, 2.0])

value_and_gradient(y)
# (<tf.Tensor: shape=(), dtype=float32, numpy=4.0>,
#  <tensorflow.python.framework.indexed_slices.IndexedSlices at 0x7bf0d6a519c0>)

tf.convert_to_tensor(value_and_gradient(y)[1])
# <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 1.], dtype=float32)>
import tensorflow as tf
import tensorflow_probability as tfp

i = tf.constant([0, 0, 1])

def value(x):
    return tf.reduce_sum(tf.linalg.matvec(tf.one_hot(i, 2), x))

def value_and_gradient(x):
    return tfp.math.value_and_gradient(value, x)


y = tf.constant([1.0, 2.0])

value_and_gradient(y)
# (<tf.Tensor: shape=(), dtype=float32, numpy=4.0>,
#  <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 1.], dtype=float32)>)
import jax.numpy as jnp
import tensorflow_probability.substrates.jax as tfp

i = jnp.array([0, 0, 1])

def value(x):
    return jnp.sum(x[i])

def value_and_gradient(x):
    return tfp.math.value_and_gradient(value, x)


y = jnp.array([1.0, 2.0])

value_and_gradient(y)
# (Array(4., dtype=float32), Array([2., 1.], dtype=float32))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants