NET-MC:  Do Markov chain simulation to sample networks.

The net-mc program is the specialization of xxx-mc to the task of
sampling from the posterior distribution for a neural network model,
or from the prior distribution, if no training set is specified.  See
xxx-mc.doc for the generic features of this program.

Computaton of the network model log likelihood and its gradient may be
done on a GPU (except for survival models), if a version of net-mc
compiled for a GPU is used.

The following applications-specific sampling procedures are implemented:

   sample-hyper [ group ]

       Does Gibbs sampling for the hyperparameters controlling the 
       distributions of parameters (weights, biases, etc.).  If a
       group is specified, only the hyperparameters pertaining to
       that group of parameters are updated.  Groups are numbered
       from 1, as in the output of net-display.

   sample-noise       

       Does Gibbs sampling for the noise variances.

   sample-lower-hyper 

       Does Gibbs sampling for all lower-level hyperparameters.

   rgrid-upper-hyper [ stepsize ]

       Does random-grid Metropolis updates (one at a time) for the
       logs of all upper-level hyperparameters, in "precision" form.
       The default stepsize is 0.1.  (Does nothing for uppermost
       hyperparameters that don't control hyperparameters one level
       down.)

   sample-lower-noise

       Does Gibbs sampling for all lower-level noise variances.

   rgrid-upper-noise [ stepsize ]

       Does random-grid Metropolis updates (one at a time) for the
       logs of all upper-level hyperameters controlling noise
       variances, in "precison" form.  The default stepsize is 0.1.
       (Does nothing if the uppermost noise hyperparameter doesn't
       control noise hyperparameters one level down.)

   sample-sigmas  

       Does the equivalent of both sample-hyper and sample-noise.

   sample-lower-sigmas

       Does the equivalent of both sample-lower-hyper and 
       sample-lower-noise.

   rgrid-upper-sigmas

       Does the equivalent of both rgrid-upper-hyper and
       rgrid-upper-noise.

An "upper-level" hyperparameter is one that controls the distribution
of lower-level hyperparameters or noise variances (which may be either
explicit or implicit).  The "lower-level" hyperparameters directly
control the distributions of weights.  Looked at another way, the
lower-level hyperparameters are the ones at the bottom level of the
hierarchy, or for which all lower-level hyperparameters have
degenerate distributions concentrated on the value of the higher-level
hyperparameter.

The random grid metropolis updates done with the above commands record
information (eg, rejection rate) for later display in the same way as
generic rgrid-met-1 updates.  

When coupling is being done, upper-level hyperparameters should be
updated only with random-grid updates; Gibbs sampling for these
upper-level hyperparamters will desynchronize the random number
streams (because of the way it is implemented using ARS), preventing
coalescence.  Lower-level hyperparameters can be updated with Gibbs
sampling, and they will exactly coalesce once the parameters they
control and the upper-level hyperparameters that control them have
exactly coalesced.

Default stepsizes for updates of parameters (weights, biases, etc.) by
the generic Markov chain operations are set by a heuristic procedure
that is described in net-models.PDF.

Tempering methods and Annealed Importance sampling are supported.  The
effect of running at an inverse temperature other than one is to
multiply the likelihood part of the energy by that amount.  At inverse
temperature zero, the distribution is simply the prior for the
hyperparameters and weights.  The marginal likelihood for a model can
be found using Annealed Importance Sampling, since the log likelihood
part of the energy has all the appropriate normalizing constants.

            Copyright (c) 1995-2022 by Radford M. Neal