Site hosted by Angelfire.com: Build your free website today!

[Statistical Computing: Lecture 15]

Adaptive rejection sampling (ARS)

Introduction

Gibbs sampler requires one to simulate from all the full conditionals. Sometimes, however, a full conditional may be complicated. Then Gibbs sampler cannot be applied directly.

Example: Suppose that we have data X.
Model: X|p, ~ Gamma(p, )
Prior: p ~ Expo(1) and ~ Expo(3) indep.
Here the joint density of (p,,X) is
f(p,,X) exp(-p) exp(-3 ) ^p X^p-1 exp(- X)/GAMMA(p)
Collecting terms involving , we get the full conditional density of as
f(|p,X) = (constant free of ) ^p exp(-(X+3) )
By inspection we recognise the full conditional as Gamma(p,X+3). Similarly, collecting the terms involving p, the full conditional density of p is
f(p | ,X) = (constant free of p) exp(-p) ^p X^p-1 /GAMMA(p),
which is not a standard distribution. It is not obvious how we may simulate from this full conditional.

In such a situation we may try to simulate from the few tough conditionals using a form of rejection sampling, called Adaptive Rejection Sampling. For this to work we need the conditional to be log-concave. But first let us review the concept of rejection sampling in general.

Rejection sampling

Suppose that we have a complicated density f that we want to sample from. Suppose also that we have a density g such that

f(x) < B g(x) for all x,

where B is some known constant. We shall call Bg an envelope of f. Sometimes we shall abuse notation and call g alone as an envelope without explicitly mentioning the constant B. Assume that we can simulate from this density, g.

Then rejection sampling is a method to modify iid samples from g to get iid samples from f as follows.

First generate X ~ g. Compute the ratio

R = f(X)/(B g(X)).

Note that since X is generated from g, the chance that the denominator vanishes is 0. Hence R is well-defined with probability 1. Toss a coin with probability of head equal to R. If you get head, return X, else repeat the process. Here is the algorithm.

Rejection sampling

LOOP: Generate X ~ g.
Let R = f(X)/(B g(X)).
Toss a coin with P(head)=R
If head
    Output X.
Else
    Go to LOOP.
    End if

Rejection sampling
LOOP: Generate X ~ g. Let R = f(X)/(B g(X)). Toss a coin with P(head)=R If head Output X. Else Go to LOOP. End if

Exercise: Can you prove that this algorithm really results in a rv with the correct density? [Hint: Define a 0-1 rv Y which is 1 iff X is accepted. You know that X ~ g. Find Y |X. Then compute the joint distribution of (X,Y). You are to show that X|Y=1 ~ f. You need to use nothing but elementary probability.]

Exercise: Show that this algorithm terminates with probability 1.

You'll be well advised to try out some simulation now with this scheme.

function X = rej(f,B)
  while(1)
    X = unifrnd(0,1); % g is Unif[0,1] density
    numerator = feval(f,X);
    denominator = B; % g(X) = 1 since g is Unif[0,1] density
        
    R = numerator/denominator;
        
    u = unifrnd(0,1);
    if(u < R) 
      return;
    end
  end

Here is a sample density

function d = tent(x)
  if x < 0.5
    d = 4 x;
  else
    d = 4- 4 x;
  end

Adaptive rejection sampling (ARS)

The main step behind any rejection sampling is to find a tight envelop. It usually requires some computation to find one. This extra computation is well worth the effort if we are going to draw a large number of samples from the distribution. But in Gibbs sampling we draw only one random sample at each step, and the distribution changes from step to step. So using rejection sampling in the usual way is a wastage in a Gibbs sampler. The following example illustrates this point.

Example: Here is the Gibbs sampler to simulate from the joint distribution of (X,Y):

Start with some X(0),Y(0)
For i = 0,1,...N
Generate X(i+1) ~ X | Y = Y(i)
Generate Y(i+1) ~ Y | X = X(i+1)
End for

Clearly, the full conditionals are changing from step to step. For instance, X(1) is generated from X|Y=Y(0) while X(2) is generated from X|Y=Y(1). These two conditional distributions may not be the same.

However, if the target density is log-concave i.e.,, its log is a concave function), then we can use an adaptive version of rejection sampling called Adaptive Rejection Sampling or ARS. The main idea in ARS is to carry out rejection sampling according to the usual method. But each time we reject a point, ARS tries to use the rejected point to tighten the envelope. This reduces the chance of rejection in future. Here is the basic structure of the algorithm. Note that it is the same as the rejection sampling algorithm except for the underlined line.

Adaptive Rejection Sampling (basic structure)

LOOP: Generate X ~ g.
Let R = f(X)/(B g(X)).
Toss a coin with P(head)=R
If head
    Output X.
Else
    Update the envelope using X.
    Go to LOOP.
End if

Adaptive Rejection Sampling (basic structure)
LOOP: Generate X ~ g. Let R = f(X)/(B g(X)). Toss a coin with P(head)=R If head Output X. Else Update the envelope using X. Go to LOOP. End if

A general class of envelopes

In order to tighten the envelope at each step, ARS uses a special class of envelopes that we discuss now. First, a theorem about the general shape of any log-concave density, which makes ARS possible.

Theorem A log-concave density is always unimodal.

Thus, for a typical log-concave density f, the graph of log(f) looks like the curve in the following diagram.

Take a finite set

S = {x₁ < ... < x_k}

of k points (k > 2), on the horizontal axis, so that the line joining (x₁,log(f(x₁))) and (x₂,log(f(x₂))) should has a positive slope and the line joining the last two points has negative slope. Given any such set S we shall construct an envelope G_s for log(f) as follows. (Then g_s:=exp(G_s) will be an envelope for f.)

Draw vertical lines through each x_i to intersect the log(f)-curve at P_i.

Graph of log(f)

For i=1,...,k-1 join P_i and P_i+1 using straight lines.

Join with lines

Now consider the zigzag crown shown below.

Zigzag crown shown with thick line

Note that this zigzag crown is an envelope over log(f). The required envelope g_s over f is the density proportional to its exponential. Thus, g_s is a piecewise exponential curve.

Tightening the envelope

Note that a larger S produces a tighter envelope. The main idea behind ARS is to enlarge S at each rejection.

Here is the algorithm (Ref: Appl Stat (1992, 41) Gilks):

Adaptive Rejection Sampling (complete)

Initialise S
LOOP: Generate X ~ g_S.
Let R = f(X)/(B_s g_s(X)).
Toss a coin with P(head)=R
If head
    Output X.
Else
    Add X to S.
    Go to LOOP.
End if

Adaptive Rejection Sampling (complete)
Initialise S LOOP: Generate X ~ g_S. Let R = f(X)/(B_s g_s(X)). Toss a coin with P(head)=R If head Output X. Else Add X to S. Go to LOOP. End if

Usually at most 5 or 6 iterations are required before the algorithm outputs a number.

Initialising S

At the very beginning we start with S of size 3:

S = {x₁,x₂,x₃},

where the line joining (x₁,log(f(x₁))) and (x₂,log(f(x₂))) has a positive slope and the line joining (x₂,log(f(x₂))) and (x₃,log(f(x₃))) has a negative slope.

During Gibbs sampling the full conditionals change only slightly from one iteration to the next. So the S from the end of one iteration provides a good starting S for the next iteration.

Exercise: Write a Matlab function or SAS module to simulate from the density proportional to exp(ax + b) over (l,r) where l may be - or r may be . In these unbounded cases, a must be negative.

Exercise: Given an S write down a ready-to-implement algorithm to simulate from g_s.