This algorithm, popular in mathematics and physics, is based on the rejection of unacceptable elements. It has been named after Nicholas Metropolis, who published it in 1953 for the Boltzmann Distribution. This algorithm generates samples according to any given probability distribution . The only requirement is that the density of the distribution can be computed at the point . The algorithm generates a Markov chain in which every state depends only on the previous state . The algorithm uses a transition function , which depends only on the current state , in order to generate a new proposed element . The proposal is accepted as the new element (x(t+1) = x' ) if drawn from the uniform distribution satisfies
Otherwise the current value is retained (i.e. ).
For example, the transition function can generate points according to the Gaussian distribution centered on the current state
where can be interpreted as the probability density for in the case when the previous value is equal to . Therefore, the transition function generates elements located around the current state with the covariance equal to and the proposed state is drawn with probability . Next, the ratio
of probabilities of the proposed element and the previous one is calculated. The new state is chosen according to the rule
The algorithm works best if the shape of the transition function matches the shape of the target distribution , i.e.
However, in the majority of cases the shape of the target distribution is unknown. If the transiition probability used in the transition function has Gaussian distribution, then the variance should be adjusted during the burn-in period. This is usually done by calculating the acceptance ratio, which should oscillate around 60%. If the steps with the proposed elements are too small, then the chain will develop slowly (i.e. it will move slowly around one area and the convergence to will also be slow). If the steps are too large, then the acceptance ratio will be very small, because the proposed elements will frequently be located in areas with very small probability density. Hence the value of will be small.
The constructor for the Metropolis algorithm has the following syntax:
MetropolisSampler(distribution, trans[, log])
Tabela 9.4. MetropolisSampler constructor settings
Parameter name | Parameter type | Description |
---|---|---|
distribution | a SamplingDistribution object | the distribution to be sampled |
trans | a TransitionFunction object | the transition function used to find next proposed point for the Markov chain |
log | boolean | specifies whether distribution is in logarithmic form (true) or not. The default value is false. |
Additional description of the MetropolisSampler parameters
It can be known up to a constant and it need not be strictly positive on its domain if it is not in logarithmic form.
The transition function used to find the candidate for the next point of the Markov chain. Its dimension has to be exactly the same as the dimension of the distribution. This function must be non-negative and symmetric with respect to its arguments.
The log = true setting is used when one wants to specify that sampling distribution is in logarithmic form (e.g. loglikelihood). This is useful when calculations are complicated, to obtain the required value faster thanks to changing multiplications into logarithm summation.
Tabela 9.5. The MetropolisSampler methods
Method name and syntax | Description of parameters | Output | Output type |
---|---|---|---|
generate(length) | length - the length of the chain to be generated | the generated Markov chain | MarkovChain |
generate(length, nrOfChains, maxLength, wholeChains) | length - the length of the converging parts of the chains (integer); nrOfChains - the number of chains to generate, must be at least 2 (integer); maxLength - the maximal length of the generated chains (integer); wholeChains - if false only the converging parts of the chains will be returned, if true whole chains will be returned. | an array of generated Markov chains with with specified length if wholeChains is set to false or whole genereted chains until convergence is observed if wholeChains is set to true | MarkovChain[] |
getAcceptanceRatio() | acceptance ratio (i.e. the value of accepted points divided by the number of proposed points) obtained during the last sampling. | double | |
getTransitionFunction() | returns the TransitionFunction object used in constructing the MetropolisSampler object. | TransitionFunction | |
getGeneratedLength() | the length of the last generated chain | int | |
getRTolerance( ) | the tolerance for the R parameter | double | |
setRTolerance(tolerance) | tolerance - the acceptable value of the R parameter, under which chain generation wil be stopped (double) | void | |
setStartingPoint(point) | point - the starting point for the algorithm (double[]) | void | |
getSeed() | the random number seed used for generating the chain | int | |
setSeed(seed) | seed - the value to set as the random number seed for generating chains. If 0 is used, the seed itself will be random. | void | |
terminate() | void |
Detailed description of selected MetropolisSampler methods
The basic method for generating chains, requires a sample from the specified distribution. All points are grouped in one MarkovChain object.
An alternative chain generation method, with the capability to generate samples with automated convergence examination. The chains are generated until the last part of the chain of specified length has the value of the R parameter below the R tolerance level or the maximal chain length is exceeded.
If the starting point is not set then a zero vector is used as one.
In 1970 W. K. Hastings generalized the Metropolis algorithm in the following way: the new element is accepted if
where is the ratio of transition function probabilities in two directions (from to and vice-versa). Due to this, the transition function need not be symmetric. Moreover, may not even depend on . In this case the algorithm is called "Independence Chain Metropolis-Hastings" (as compared to "Random Walk Metropolis-Hastings"). If the transition function is fitted, the algorithm can achieve better accuracy then the random walk variant, but it also requires some a-priori knowledge about the distribution.
The constructor for the Metropolis - Hastings algorithm has the following syntax:
MetropolisHastingsSampler(distribution, trans[, log])
Tabela 9.6. MetropolisHastingsSampler constructor settings
Parameter name | Parameter type | Description |
---|---|---|
distribution | a SamplingDistribution object | the distribution to be sampled |
trans | a TransitionFunction object | the transition function used to find next proposed point for the Markov chain |
log | boolean | specifies whether distribution is in logarithmic form (true) or not. The default value is false. |
Additional description of the MetropolisHastingsSampler parameters
It can be known up to a constant and it need not be strictly positive on its domain if it is not in logarithmic form.
The transition function used to find the candidate for the next point of the Markov chain. Its dimension has to be exactly the same as the dimension of the distribution. This function must be non-negative and symmetric with respect to its arguments.
The log = true setting is used when one wants to specify that sampling distribution is in logarithmic form (e.g. loglikelihood). This is useful when calculations are complicated, to obtain the required value faster thanks to changing multiplications into logarithm summation.
The methods for MetropolisHastingsSampler are exactly the same as for MetropolisSampler.
Bayesian inference is a statistical inference method in which events or observations are used to update the probability that a given hypothesis is true. The name comes from the Bayes' theorem, derived from the works of Thomas Bayes, which is frequently used in the inference process
There are two ingredients to Bayesian inference: the sampling distribution and the distribution . Treating the first ingredient as a function of we get the likelihood function of . The second ingredient is called prior density as it contains the probability distribution of prior to observing the value of . Therefore, the interference is based on the probability distribution of after observing the value of , that becomes a part of the set of availiable information. This distribution is called the posterior distribution and can be obtained using the Bayes' theorem
where
If one denotes and observes that is constant, then it follows that the equation for the density of the posteriori distribution can be rewritten in a more compact form
The constructor for the posteriori distribution sampler has the following syntax:
PosteriorSampler(likelihood, prior[, trans])
If only two arguments (likelihood and prior are provided, the constructor uses the Metropolis algorithm, sampling the data from the prior distribution (i.e. the proposal points are drawn diretly from the prior distribution without using a transition function and so the next proposal point is independent on the previous one). If the transition function trans is also provided, the constructor uses the Metropolis - Hastings algorithm.
Tabela 9.7. PosteriorSampler constructor settings
Parameter name | Prameter type | Description |
---|---|---|
likelihood | a LogLikelihood object | the likelihood function to use |
prior | a MultivariateDistribution object | the distribution to be sampled |
trans | a TransitionFunction object | the transition function to use for finding the next proposed points in the Markov chain |
Additional description of the PosteriorSampler parameters
This is the likelihood function in the logarithmic form.
This is the a-priori function. It has to have the same dimension as the likelihood function.
This is the transition function used to find the candidate for the next point in the Markov chain based on the current point. Its dimension has to be exactly the same as the dimension of distribution. This function must be nonnegative in its whole domain.
The PosteriorSampler object has the same methods as MetropolisSampler with the addition of the methods listed in the table below.
Tabela 9.8. PosteriorSampler exclusive methods
Method name and syntax | Description of parameters | Output | Output type |
---|---|---|---|
getPrior() | the prior function used | a MultivariateDistribution object | |
getInnerSampler() | the sampling algorithm used internally by the PosteriorSampler object | a MetropolisSampler or MetropolisHastingsSampler object. |