Skip to content

Markov chain Monte Carlo (MCMC)

Oscar Barragán edited this page Feb 3, 2020 · 3 revisions

The calculation of a marginal posterior distribution can be done analytically or numerically. However, in some cases, it may not have an analytic solution. For instance, numerical iterative methods are widely used to sample the parameter space in order to generate marginal posterior distributions from a collection of data points.

An efficient method to generate a set of data points in a parameter space is by using a Markov chain. Following the definition of Sharma 2017, a Markov chain is a sequence of random variables X1,..., Xn such that, given the present state, the future and past are independent. If random numbers are used to generate the Markov chains, this method is called Markov chain Monte Carlo (MCMC). These random variables can be the points φin the parameter space that we want to sample. For instance, if we start a point in the parameter space φ1, we can generate a set of different models φi via Markov chains. In this way, we can create a set of N models from an initial φ1.

There is a large variety of MCMC sampling methods, which ensure that the Markov chains converge to the optimal solution where the posterior has a static solution. For a basic MCMC algorithm, we refer the reader to the Metropolis-Hastings algorithm \citep{Metropolis1953,Hastings1970}.

Metropolis-Hasting algorithm

We follow the description of the Metropolis-Hasting algorithm given by Sharma (2017). The simplest way to generate a set of states is by evolving a Markov chain given an initial state φ1 in the parameter space. After a number N of iterations, we can select the states with highest probabilities to create the posterior distribution. In order to sample a set of parameters φt in a parameter space, we can propose a new state Φ based on a proposal distribution Q(Φ|φ). The new state is accepted with probability

where

We note that if the proposed new state is not accepted, the new state (t+1) is the current state (t). There are two important things that are missing at this point. The first one is how to define the proposal distribution Q. The proposal distribution has to be built in such a way that satisfy detailed balance, this ensures the convergence to a stationary solution (more details in Sharma (2017)). The simplest proposal distribution is the symmetric distribution proposed by Metropolis et al. (1953) such that

leading to an acceptance probability which involves only the posterior distributions of each sate, i.e., min(1,P(Φ|D)/P(φ|D)). For other proposal distributions which satisfy detailed balance see Sharma (2017).

The second important point to solve is how to select the chain's sample which represent the marginal posterior distribution for each parameter. One of the advantages of the Metropolis- Hastings algorithm is that it ensures that the chains converge to a stationary solution after a finite number of iterations. Figure below shows an example of a chain evolved using the Metropolis-Hastings algorithm after 5000 iterations. We see that the chain has a random behavior at the beginning of is evolution. After a given number of iterations, the chain reach a stationary solution around a region in the parameter space. The first steps in the Metropolis-Hastings algorithm are not useful to construct a posterior distribution from the parameter sample. For instance, the first iterations are thrown away to just keep with converged chains, this process is called burn-in. Figure below also shows the burn-in of the chain removing the first 1000 iterations. By analyzing the frequency of the states of the remaining chains, we can estimate the marginal posterior distribution from which we can estimate the model parameters.

chains

Example of the evolution of the Metropolis-Hasting algorithm. Upper panel: Parameter value for the Markov chain from iteration 0 to 5000 in logarithmic scale. Lower left panel: Chain behavior for the last 4000 iterations. Lower right panel: Histogram created using the information contained in the last 4000 iterations.