ผลต่างระหว่างรุ่นของ "Nested sampling manual"

รุ่นแก้ไขปัจจุบันเมื่อ 12:32, 10 เมษายน 2550

เนื้อหา

1 Introduction
- 1.1 NOTE for Oli
2 Program parameters
- 2.1 Main parameters
- 2.2 Minor parameters
3 Default values of parameters: the details
4 References

Introduction

Generally, nested sampling is used for calculating any integrals, e.g. an evidence in Bayesian model selection problems. This program will concentrate only on the problem of selecting the number of components in a Mixture of [Spherical] Gaussians (MOGs) given observed data. In this problem, the likelihood is a product of MOGs, and we assume that the prior is uniform (or truncated-log-uniform for the deviation parameter) over the parameter space.

NOTE for Oli

To test the correctness of our implementations, I also provide a simple MOGs likelihood for us [see (1.2) below]: a mixture of three spherical gaussian with $\sigma \,=1$ , so that the integral result will be $3\cdot (2\pi )^{D/2}$ where $D\,$ is the number of dimensions in the parameter space.

Program parameters

Main parameters

Normally, Nested sampling is controlled by 4 main parameters:

$Nclus$ (no default value)

This defines the number of dimensions of the parameter space $D$ .

In our problem of learning (spherical) MOGs, $D=(2+d)\cdot Nclus$ where $d$ is the dimension of the data space; To visualize the result, I usually use $d=2$ ;
(** just for developers**) if the likelihood is a simple MOG, I define $D=Nclus$ . (see (6.2))

$Next$ (default = $150\cdot D\log D$ )

The degree of extreme value distribution (Skilling, 2006; eq.(17)). This is the number of initial points for each nested sampling iteration which we can use to solve the problem of sampling from a truncated prior (My MCMCMC paper). This number also controls stability of nested sampling (greater --> more stable).

$Nwalk$ (default= $D\log D$ )

The so-called burn-in parameter in MCMC literatures. This parameter is used to solve the problem of sampling from a truncated prior.

$Niter$ (default= $2Next\cdot Nwalk$ )

The (estimated) maximum number of nested sampling iterations.

Minor parameters

test_likelihood (default [undefined])

If define, the program will switch the likelihood to a simple mixture of spherical gaussians (explained above).

walk (default 1)

determine the type of random walk

walk = 1, use slice sampling with hyperrectangle (Neal, 2003; section 5.1)
walk = 2, use metropolis-hasting sampler with GP-proposal (our project)

$Ngp$ { if (walk = 2) } (no default)

we have to define this number. This defines the number of pseudo-walk using GP as an approximation of the real likelihood.

Default values of parameters: the details

$Niter$

Let $M\,$ be the mass of the $D\,$ -dimensional parameter space and let $m\,$ be a mass defined by a typical set with respect to a given posterior. Typically, when $D\,$ increases, $m/M\,$ converges to zero exponentially fast. Moreover, when we integrate over all parameter space with respect to the posterior, only this fraction of parameters contributes a value in the integral result.

Thus, the iteration of nested sampling, $Niter\,$ must be high enough for nested sampling to reach this region. Since for each iteration, the mass on average is reduce by the factor ${\frac {Next}{Next+1}}\,$ , we have

$\left({\frac {Next}{Next+1}}\right)^{Niter}={\frac {m}{M}}.$

Using the approximation $1+x\approx e^{x}$ , we have

$Niter\approx Next\cdot \log {\frac {M}{m}}.$

Assuming we face the worst case, $D^{-D}$ is a reasonable value for $m/M$ so that we have

$Niter\approx Next\cdot D\log D.$

$Next$

Assume first that we fix $Next\,$ to some positive integer.

Let $x_{i}\,$ be a random variable of a shrinked mass for each iteration $i=1,2,...$ . According to nested sampling procedure, each $x_{i}\,$ is an extreme value random variable with degree $Next$ . Up to $I^{th}\,$ iteration, the mass $M\,$ is shrinked by $\prod _{i=1}^{I}x_{i}$ . Taking logarithm, the log of remaining mass is $\log M+\sum _{i=1}^{I}\log x_{i}$ .

Define $I^{*}=Next\cdot \log {\frac {M}{m}}\,$ so that

$E[\sum _{i=1}^{I^{*}}\log x_{i}]=\log {\frac {m}{M}}.$

Since each $x_{i}\,$ is independent, we have

$Var[\sum _{i=1}^{I^{*}}\log x_{i}]=\sum _{i=1}^{I^{*}}Var[\log x_{i}]={\frac {I^{*}}{(Next)^{2}}}.$

By a usual estimation ( $x=\mu \pm 3\sigma$ ), we have

$\sum _{i=1}^{I}\log x_{i}\approx \log {\frac {m}{M}}\pm 3{\sqrt {\frac {\log {\frac {M}{m}}}{Next}}},$

or, with high probability,

$\prod _{i=1}^{I}x_{i}\in \left[{\frac {m}{M}}\div \exp \left(3{\sqrt {\frac {\log {\frac {M}{m}}}{Next}}}\right),{\frac {m}{M}}\times \exp \left(3{\sqrt {\frac {\log {\frac {M}{m}}}{Next}}}\right)\right]$

Define $\gamma =\exp \left(3{\sqrt {\frac {\log {\frac {M}{m}}}{Next}}}\right)$ be a stability factor specified by user (e.g. I use $\gamma =1.25$ ) and rearrange the equation we get

$Next={\frac {9\log {\frac {M}{m}}}{(\log \gamma )^{2}}}.$

Finally, pessimistically estimate $m/M$ by $D^{-D}$ and use $\gamma =1.25$ we get $Next\approx 144D\log D$ . So this is why I set $Next$ to $150D\log D$ .

$Nwalk$

The default value of $D\log D$ for $Nwalk$ is adhoc. I try to make the total time complexity of nested sampling be comparable to that of MCMC methods for computing the volume of convex body, e.g. the work of Lovasz and Vempala (2003).

References

My MCMCMC paper.
Skilling (2006). Bayesian Statistics 8.
Neal (2003). Slice sampling (with discussions). Annals of Statistics.
Ramussen (2003). Bayesian Statistics 7.
Gelman et al. (1996). Efficient Metropolis Jumping Rules. Bayesian Statistics 5.
Lovasz and Vempala (2004). Simulated Annealing in Convex Bodies and an O(n^4) Volume Algorithm. link

@@ แถว 49: / แถว 49: @@
 # walk = 2, use metropolis-hasting sampler with GP-proposal (our project)
-==== <math>Ngp</math>  { if (''walk'' = 2)  } ====
+==== <math>Ngp</math>  { if (''walk'' = 2)  } (no default) ====
 we have to define this number. This defines the number of pseudo-walk using GP as an approximation of the real likelihood.
+== Default values of parameters: the details ==
+===<math>Niter</math>===
+Let <math> M \, </math> be the mass of the <math>D\,</math>-dimensional parameter space and let <math> m \, </math> be a mass defined by a typical set with respect to a given posterior. Typically, when <math> D \, </math> increases, <math> m/M \, </math> converges to zero exponentially fast. Moreover, when we integrate over all parameter space with respect to the posterior, only this fraction of parameters contributes a value in the integral result.
+Thus, the iteration of nested sampling, <math> Niter \, </math> must be high enough for nested sampling to reach this region. Since for each iteration, the mass '''on average''' is reduce by the factor <math> \frac{Next}{Next + 1} \, </math>, we have
+<math>
+\left(\frac{Next}{Next + 1} \right)^{Niter} = \frac{m}{M}.
+</math>
+Using the approximation <math> 1 + x \approx e^x </math>, we have
+<math>
+Niter \approx  Next \cdot \log \frac {M}{m}.
+</math>
+Assuming we face the worst case, <math> D^{-D} </math> is a reasonable value for <math>m/M</math> so that we have
+<math>
+Niter \approx  Next \cdot  D \log D.
+</math>
+=== <math>Next</math> ===
+Assume first that we fix <math> Next\, </math> to some positive integer.
+Let <math> x_i \, </math> be a random variable of a ''shrinked mass'' for each iteration <math> i = 1, 2, ... </math>. According to nested sampling procedure, each <math> x_i \, </math> is an ''extreme value'' random variable with degree <math> Next</math>. Up to <math> I^{th}\, </math> iteration, the mass <math> M\, </math> is shrinked by <math> \prod_{i=1}^I x_i </math>. Taking logarithm, the log of remaining mass is <math> \log M + \sum_{i=1}^I \log x_i </math>.
+Define <math> I^* =  Next \cdot \log \frac{M}{m}\, </math> so that
+<math>
+E[\sum_{i=1}^{I^*} \log x_i] = \log \frac{m}{M}.
+</math>
+Since each <math> x_i \, </math> is independent, we have
+<math>
+Var[\sum_{i=1}^{I^*} \log x_i] = \sum_{i=1}^{I^*} Var[ \log x_i] = \frac{I^*}{(Next)^2}.
+</math>
+By a usual estimation (<math> x = \mu \pm 3\sigma </math>), we have
+<math>
+\sum_{i=1}^I \log x_i \approx \log \frac{m}{M} \pm 3\sqrt{\frac{\log \frac{M}{m}}{Next}},
+</math>
+or, with high probability,
+<math>
+\prod_{i=1}^I x_i \in \left[ \frac{m}{M} \div \exp \left(3\sqrt{\frac{\log \frac{M}{m}}{Next}} \right) , \frac{m}{M} \times \exp \left(3\sqrt{\frac{\log \frac{M}{m}}{Next}} \right) \right]
+</math>
+Define <math>\gamma = \exp \left( 3\sqrt{\frac{\log \frac{M}{m}}{Next}} \right)</math> be a stability factor specified by user (e.g. I use <math> \gamma = 1.25 </math>) and rearrange the equation we get
+<math>
+Next = \frac{9 \log \frac{M}{m}}{ (\log \gamma)^2}.
+</math>
+Finally, pessimistically estimate <math>m/M</math> by <math>D^{-D}</math> and use <math>\gamma = 1.25</math> we get
+<math> Next \approx 144 D \log D </math>. So this is why I set <math> Next </math> to <math> 150 D \log D </math>.
+=== <math>Nwalk</math> ===
+The default value of <math>D \log D</math> for <math>Nwalk</math> is adhoc. I try to make the total time complexity of nested sampling be comparable to that of ''MCMC methods for computing the volume of convex body'', e.g. the work of Lovasz and Vempala (2003).
 ==References==
@@ แถว 58: / แถว 122: @@
 * Ramussen (2003). Bayesian Statistics 7.
 * Gelman et al. (1996). Efficient Metropolis Jumping Rules. Bayesian Statistics 5.
+* Lovasz and Vempala (2004). Simulated Annealing in Convex Bodies and an O(n^4) Volume Algorithm. [http://citeseer.ist.psu.edu/673703.html link]

ผลต่างระหว่างรุ่นของ "Nested sampling manual"

รุ่นแก้ไขปัจจุบันเมื่อ 12:32, 10 เมษายน 2550

เนื้อหา