GENERATING TEST DATA FOR SOURCE LOCATION MODELS.

This example shows how a source model can be specified, and how to set
its parameters to specific values.  Measurements made for a grid of
detectors can then be generated, with random noise.  This data will be
used in the later examples of fitting models.


Three 'spec' commands are used to specify the structure and prior
distributions for a source model.  The first command to use (see
src-spec.doc) creates a log file, in which it stores the
specifications for how many sources there are and the priors on the
locations and intensities of these sources.  Here is an example:

    > src-spec logg 3 0:5 / -10:10 -1:1 0:1

This creates a log file called 'logg' (wiping out any previous file of
that name), and store in it specifications for a model with 3 sources,
with intensities in the range 0 to 5, and with x, y, and z coordinates
in the ranges -10 to 10, -1 to 1, and 0 to 1, respectively.  The prior
distributions for the intensities and coordinates are uniform over
these ranges.

Following this command, the properties of the detectors can be
specified (see det-spec.doc).  The simplest option is for a detector
with Gaussian noise, with a fixed standard deviation, which can be
specified using a command such as the following:

    > det-spec logg 0.1

This appends a record to the log file 'logg' specifying that the
detector noise standard deviation is 0.1.

Finally, the model for how contaminants flow through the atmosphere
must be specified (see flow-spec.doc).  The 'flow-spec' command is
designed to allow for various such models, but at present only a
'test' model is implemented.  We can specify that this model be used
as follows:

    > flow-spec logg test 1 0.08 0.0001 0.06 0.00015

This specifies use of the test model, with wind speed of 1, and other
parameters as specified.  Note that this is a steady-state model, in
which time is irrelevant.


To generate data from this model, we need to fix the locations and
intensities of the three sources that it specifies exist.  Before we
can do this, however, the following command is needed:

    > data-spec logg 3 1 / /dev/null .

The data-spec command is meant for more general usage (see
data-spec.doc).  Here, it is a bit redundant, but is nevertheless
required by the software.  The arguments after the name of the log
file are the number of "inputs" in data files - in this case, 3,
giving the x, y, z coordinates of sources - and the number of
"targets" - which for source models is always 1, representing a
measurement by a detector.  The remaining arguments just indicate that
no data is available yet.

We can now specify the locations and intensities of the sources with
the src-initial command (see src-initial.doc), which can also be used
to initialize MCMC runs.  Here is an example:

    > src-initial logg / / 0.5 4.5 -0.4 0.2 / 0.8 -6 0.7 0.85 / 1.8 9 0.1 0.45

For this model, the detector and flow specifications have no variable
parts, so the first two groups of arguments after the name of the log
file are missing (nothing before the first two "/" arguments).  What
follows are the intensities and locations of the three sources, given
as four numbers with intensity first, followed by x, y, and z
coordinates.  Note that these numbers lie within the ranges allowed by
the src-spec command above.  These values for the model parameters are
stored in the log file, with an "index" of 0.


We can now actually generate the data.  To start, we can specify a
random number seed to use:

    > rand-seed logg 1

This specifies seed 1, which would have been the default in any case.
We need a grid of x, y, z locations at which we want measurements (ie,
locations where detectors are assumed to exist).  The 'grid' program
(see grid.doc) is useful for this:

    > grid -10:10%0.1 -1:1%0.1 0.3:0.9%0.3 >grid1

This creates a file called 'grid1' containing a grid of x, y, z
coordinates spanning the ranges -10 to 10, -1 to 1, and 0.3 to 0.9,
including all values in those ranges that are multiples of 0.1, 0.1,
and 0.3, respectively.  Here are the first ten lines of grid1:

    -1.000000e+01 -1.000000e+00 +3.000000e-01
    -1.000000e+01 -1.000000e+00 +6.000000e-01
    -1.000000e+01 -1.000000e+00 +9.000000e-01
    -1.000000e+01 -9.000000e-01 +3.000000e-01
    -1.000000e+01 -9.000000e-01 +6.000000e-01
    -1.000000e+01 -9.000000e-01 +9.000000e-01
    -1.000000e+01 -8.000000e-01 +3.000000e-01
    -1.000000e+01 -8.000000e-01 +6.000000e-01
    -1.000000e+01 -8.000000e-01 +9.000000e-01
    -1.000000e+01 -7.000000e-01 +3.000000e-01
   
This file will be used as the input file when fitting models to the
data generated.  The file of measurements taken at these grid
locations is generated by the src-dgen command (see src-dgen.doc),
an example of which follows:

    > src-dgen logg 0 / grid1 data-grid1-0.1-1

This uses the parameters for the model stored in the log file under
index 0 (produced with the src-initial command above) to generate
measurement values for detectors at all the locations in the file
'grid1', with random noise added.  These noisy measurments are stored
in the file data-grid1-0.1-1 (named for the grid, the noise level, and
the random seed).  This will be used as the file of target
measurements when fitting a model to this data.  Here are the first
ten lines of this file:

    3.38862e-01
    3.26606e-01
    2.16727e-01
    2.05567e-01
    4.46061e-01
    1.95426e-01
    2.67355e-01
    3.25556e-01
    2.47019e-01
    2.79804e-01

Negative measurements are possible, even though actual concentrations
are non-negative, since Gaussian measurment error is assumed.  Note
that slight differences from one machine to another are possible for
this and other output.

We can instead specify a much smaller noise level (eg, 1e-30) in the
det-spec command, and thereby generate effectively noise-free
measurements of concentrations at the grid points.  This is useful for
checking results.  Here are the first ten lines of the file
data-grid1-0-1, generated in this way:

    3.56170e-01
    3.17061e-01
    2.61963e-01
    3.72152e-01
    3.31319e-01
    2.73783e-01
    3.86745e-01
    3.44371e-01
    2.84644e-01
    3.99744e-01

The same log file can be used to generate the data at coarser grids,
in order to test our ability to infer source locations with varying
amounts of data.  Data files data-grid2-0.1-1 (with 126 detectors),
data-grid3-0.1-1 (with 63 detectors), and data-grid4-0.1-1 (with 15
detectors) were generated for testing and use in later examples.  For
grid3 and grid4, all measurements have the same z coordinate, which
leads to ambiguous inferences that test the ability of the MCMC
methods to sample a multimodal posterior distribution.

Some R functions to plot data are in data-plot.r.  A script to use
these functions to plot data on the four grids mentioned above is in
plt-data.r.