NOTES ON THE VERSION OF 2022-04-21

This version has some substantial new features for neural network
models, and some important performance improvements for neural
networks, including support for computation on GPUs.  The tutorial
examples for neural network models have also been extended.

Note: Log files produced by earlier versions cannot be read by this
version of the software.  There are also some incompatible changes in
command syntax (see feature changes (3), (13), and (14) below).

Documentation changes in this version.

1) Detailed documentation on the neural network models and their
   implementation is now provided in doc/net-models.PDF.  Some
   references to the models and Markov chain methods used are now
   listed in References.doc.

2) All the tutorial examples have been updated, with timing results
   now given for a modern processor.  Some examples are now
   accompanied by plots, supplied as PNG files.

3) The neural network examples for a simple regression model, in
   Ex-netgp-r.doc, have been extended to illustrate some general
   issues with Markov chain Monte Carlo methods and Bayesian
   inference.

4) There is a new example of image classification with neural network
   models, including use of a convolutational layer, in Ex-image.doc.

Feature changes in the version:

1) For neural network models, the configuration of the connections and
   their weights into a hidden or output layer, from the input layer
   or a previous hidden layer, as well as the hidden or output unit
   biases, may now be set up explicitly, rather than (as previously)
   layers always being fully-connected, without weight sharing.  In
   particular, this allows for specification of models with
   convolutional layers.  See net-spec.doc, net-config.doc, and
   net-config-check.doc for details, as well as the tutorial example
   in Ex-image.doc.

2) Non-sequential hidden layer connections are now supported - ie,
   hidden layers can now connect to later hidden layers other than
   their immediate successor.  The number of such non-sequential
   connections is limited to 16.  (Note that non-sequential
   connections from inputs and to outputs were already allowed.)

3) Hidden layers may now use the 'softplus' activation function, given
   by h(u) = log(1+exp(u)), in addition to the previous options of
   'tanh' or 'identity'.  There is also a 'softplus0' activation
   function, which is 'softplus' shifted down to have output zero for
   input zero.  

   However, the previous option of 'sin' as an activation function has
   been removed, since the implementation no longer saves the summed
   input to hidden units (before the activation function is applied),
   so the activation function must have a derivative computable from
   its value (which sin does not).  

   See net-spec.doc for how to specify the activation function for a
   hidden layer.

4) The sample-hyper operation for neural network models can now take
   an argument in order to restrict the updates to the hyperparameters
   controlling a single group of parameters.  This may be useful in
   the initial stages of sampling, to avoid some hyperparameters
   taking on bad values when the data has not yet been fit well.

5) The net-eval program can now optionally display the values of
   hidden units in some layer, instead of the final output.

6) The data-spec command now has an optional -e argument that causes
   the training and test inputs and targets read to be echoed to
   standard output.  This can be useful in checking that the data
   source is specified correctly.

7) New C0, C1, and Cn for n>1 quantities are now defined, to help assess
   how well metropolis and hybrid updates are exploring the distribution.
   (But note that these are masked for mixture models, where they have
   another meaning.)

8) The net-gen program has been extended to allow more flexibility in
   how parameters are set, including from standard input, rather than 
   to zero, or randomly.  See net-gen.doc for details.

9) The xxx-tbl commands can now take a -f argument, which causes them
   to continually follow iterations being added to the last log file
   specified, rather than finishing when EOF is encountered.  This is
   useful if the output is piped to a plotting program that is capable
   of following continuing input, updating the plot in real time, and
   also when the output is just shown in a terminal window in order to
   monitor the run.

10) The net-spec program now has a "sizes" option for displaying the
    number of parameters in each group.  See net-spec.doc for details.

11) The xxx-grad-test programs (eg, net-grad-test) can now optionally
    display only the computed gradient, or only the energy, or can be
    restricted to display and check against the result from finite
    differences only a single component of the gradient.  (This is
    useful since checking the full gradient can be very slow for
    high-dimensional models.)  See xxx-grad-test.doc for details.

12) The net-display program now allows for a -P or -H option, for
    displaying unadorned high-precision parameter or hyperparameter
    values.

13) The "omit" option in net-spec should now be placed after the
    corresponding prior on connections to inputs, rather than after
    the size of the hidden or output layer they connect to (as
    before).

14) The old (positional) syntax for net-spec is no longer allowed.  It
    would not have supported the new feature of non-sequential hidden
    layer connections.

15) The maximum number of hidden layers in a neural network is now 15
    (up from 7 before).

16) The maximum number of iterations that can be used when making
    predictions with the median has been increased from 200 to 1000.

17) Specifying use of a symmetric sequence of approximations in a
    leapfrog trajectory specification by giving a negative value for
    N-approx is now documented in mc-spec.doc.  (It had previously
    been implemented but not documented.)

Performance changes in this version:

1) Forward / backward / gradient computations for neural networks have
   been sped up, by rewriting the portable code to encourage the
   compiler to use vector instructions, and by (optionally) using
   specially-written code to exploit SIMD and FMA instructions, when
   available (SSE2, SSE3, SSE4.2, AVX, and AVX2 are supported).

2) Computations for neural networks have also been sped up by using
   the SLEEF library for vectorized mathematical functions (eg, tanh).

3) The precisons for parameter and unit values in neural network
   models may now be either double-precision (FP64), as previously, or
   single-precision (float, FP32).  The default is single-precision,
   unless this is changed in the make-all script.  Using lower
   precision typically speeds up the computations (but with some
   effect on the results).  Note that for models other than neural
   networks arithmetic is still always done in double precision.

4) Computations for neural network models may now be (partially) done
   on a GPU (one that support CUDA with compute capability 3.5 or
   later).  See Install.doc for how to do this.  Note that for
   networks with few parameters, or that are trained on a small data
   set, the GPU version will not necessarily be faster.  Many GPUs are
   also slow at double-precision computations.  Note that GPU
   computation is not done for survival models.

Bug fixes.

1) Fixed a bug in network function evaluation when input offsets are
   present and connections from some inputs are omitted.

2) Documentation for the 'plot' mc operation has been corrected.

3) Incorrect prior specification (not matching ccmds.net) corrected in
   Ex_netgp-c.doc (and Ex_netgp-c.html).

4) Fixed a bug in which an mc-spec command with no trajectory
   specification would not reset to the default trajectory spec if an
   earlier mc-spec had had a trajectory specification.

Known bugs and other deficiencies.

1) The facility for plotting quantities using "plot" operations in xxx-mc
   doesn't always work for the first run of xxx-mc (before any
   iterations exist in the log file).  A work-around is to do a run of
   xxx-mc to produce just one iteration before attempting a run of
   xxx-mc that does any "plot" operations.

2) The CPU time features (eg, the "k" quantity) will not work correctly
   if a single iteration takes more than about 71 minutes.

3) The latent value update operations for Gaussian processes may recompute 
   the inverse covariance matrix even when an up-to-date version was 
   computed for the previous Monte Carlo operation.

4) Covariance matrices are stored in full, even though they are symmetric,
   which sometimes costs a factor of two in memory usage.

5) Giving net-pred several log files that have different network architectures
   doesn't work, but an error message is not always produced (the results may
   just be nonsense).

6) Some Markov chain updates for Dirichlet diffusion tree models in which 
   there is no data model (ie, no noise) are not implemented when some of 
   the data is missing.  An error message is produced in such cases.