There are 6 major optimizations that I have applied to the GHYS/QT refinement procedure. The first two are trivial, and no further details will be given about them. The final four will be explained in more detail in the remaining sections of this chapter.

*Definition* If program *A* has a *speedup* of over
program *B*, then the execution time of *A* is a fraction of
*B*'s time; in other words *A* runs times faster than *B*.

QT re-computed a new resolvent for each stable and unstable vector, *i.e., *
at each timestep they computed 6*M* resolvents.
However, a resolvent is a matrix operator that applies to
*all* small perturbations of the orbit from time to .
Thus only 2 resolvents per timestep need be computed: one for evolving
perturbations
forward in time, and another to evolve them backwards.
Since most of the time is spent computing the resolvents, fixing this
oversight provides an immediate speedup of a factor of about 3*M*.

The above optimization can be extended even further by
noting that the resolvent matrix that maps perturbations from timestep
*i* to *i*+1 is precisely the inverse matrix of the resolvent that maps
small perturbations from timestep *i*+1 to *i*. Thus, rather than doing
an integration from timestep *i*+1 backwards to time *i* to compute
, can be inverted using matrix operations, or the system
can be solved, whenever necessary, using standard elimination
schemes. This gives an
additional speedup of about 2, since only one resolvent per timestep
need explicitly be computed using integration of the variational equations.

Define a *shadow step* as a time interval across which the
1-step error and correction co-efficient are computed.
QT used every internal timestep of their
integrator's noisy orbit as a shadow step, but this is not necessary. It
is reasonable to skip timesteps in the original orbit to build larger
shadow steps. This means that fewer resolvents
and stable/unstable directions need to be computed and stored.

When computing the resolvent and 1-step errors during early refinements,
it is not necessary to
compute them to extremely high accuracy. Since the
initial 1-step errors may have magnitudes of about ,
the resolvents and 1-step errors need only be computed to a tolerance of,
say, , *i.e., * 3 extra digits.
QT always computed the 1-step errors and resolvents to a tolerance
of .

If a shadow exists, then by definition it cannot be too far away from the original noisy orbit. It is reasonable to assume that the resolvents, unstable and stable directions (abbreviated RUS) will change little as the orbit is perturbed towards the shadow. Thus, it may not be necessary to recompute the RUS for every refinement iteration. This is probably the biggest savings, because in combination with the cheaper integrator, it means that the RUS usually need only be computed once to a tolerance of about , and this RUS will suffice for all future refinements. This is analogous to a simplified Newton's method that does not recompute the Jacobian at ever iteration.

Recall that to find the
longest possible shadow of a noisy orbit, we attempt to shadow for longer
and longer times until shadowing fails. Assume shadowing for *S* shadow steps
produces a shadow *A*. Then attempt shadowing for *S*+*k* timesteps
for some integer *k*, and assume a successful shadow *B* will be found.
Since *A* and the first *S* steps of *B* both shadow the same noisy orbit,
they must be close to one another.
By the same argument as the previous paragraph, the RUS that was computed
for *A*
can probably be re-used for the first *S* shadow steps of *B*.
Thus only *k* new RUS 's need be computed.

I use the acronym *RUS* to mean the set of all *R*esolvents,
*U*nstable, and *S*table vectors for the entire trajectory.
The RUS at a particular timestep *i* is referred to as RUS .

For a smooth, continuous, well-behaved function, the Jacobian
(*i.e., * the derivative) is also smooth and continuous. Most Hamiltonian
systems (that I have seen) studied by scientists are at least piecewise of
this sort. The *N*-body problem, in particular, is of this sort, as long
as no particles pass within 0 distance of each other.
This implies that
the resolvent along a particular path will not change much if the
path is perturbed slightly.
Since the unstable and stable unit vectors are derived solely from the
resolvents, they will also change only slightly when the path is perturbed.
Thus, the RUS computed on the first
refinement iteration should be re-usable on later iterations, and
computing the new 1-step errors is the only significant work to be
performed on each new refinement iteration.
(Computing the corrections from the errors
is much less CPU intensive than computing the errors themselves.)

I have empirically found that computing the RUS to a tolerance of between and usually suffices to refine the orbit down to 1-step errors near the machine precision. If refinement fails using constant RUS, then perhaps there is a spot in the orbit where some RUS 's change quickly with small perturbations in the path. In that case, the RUS can be re-computed for a few refinements, after which constant RUS may be tried again. However, a case in which constant RUS fails often suggests that there as a mild trouble spot somewhere on the orbit that may cause a glitch in longer shadows.

The algorithm used to switch between constant RUS and recomputed RUS
uses a running-average ``memory'' scheme to keep track of the progress of
the magnitude of the maximum 1-step error over the previous few refinements.
Originally, I used a simple algorithm (like QT's) that would signal
failure when a certain number *K* of successive refinements had failed,
for and *K*=3. (See here for the
definition of a successful refinement.) However, refinement would sometimes
get into a loop in which there would be a few successful refinements,
followed by one or more that would ``undo'' the improvements of the previous
ones.
For example, the largest 1-step error may cycle as
.
In general, cases were seen in which the number of refinements in these
loops was 3, 4, and up to about 8. Clearly a simple ``die when *K* successive
failures are seen'' is not general enough to catch this kind of loop.

Queuing theory provides a formula that is useful here. It
defines an *arithmetic running average* *A* to be

where is the newest element added to the set, and is
the *memory* constant. The higher the memory constant, the longer
the memory -- *i.e., * the less the effect of an individual new element.
A rule of thumb is that *A* is roughly an average of the most recent
elements. Equation 2.0, however, is not suited for
measuring errors such as those in refinement
that can change by many orders of magnitude,
and is especially unsuited when smaller means better.
For example, a string of errors of followed by an
error of would average to about , whereas a change from
to is an indication that refinement is succeeding.
We thus need a *geometric* equivalent of equation 2.0, to allow values differing by orders of magnitude to average meaningfully.
I define the
*geometric running average* *G* as

where is the new element, and *n* is an integer,
, so higher *n* implies longer memory.
Both equations 2.0 and 2.1 require
initialization to some reasonable starting value and ,
respectively. *n* should not be too large; I found 2 or 3 worked
best.

Finally, we want an equation that measures the *improvement* that is
being made by successive refinements,
rather than one that measures absolute error,
because it is the *change* in 1-step errors that indicates whether current
refinements are succeeding, not their absolute size.
Note that the 1-step errors may stop decreasing for two reasons:
(1) 1-step errors
have reached the machine precision, or
(2) refinement is failing to find a shadow.
Using the improvement rather than the absolute value allows us to use the same
algorithm to halt refinement in either case.
Thus define the improvement *I* to be

If *I* is close to or greater than 1, then the current refinement did not
improve on the previous one; if *I* << 1 then the current
refinement is successful.

We have two refinement methods. In order of decreasing cost, they are recomputed RUS, and constant RUS. The general idea is: if refinement is working well with an expensive method, then switch to a cheaper one; if a cheaper one seems to be failing, switch to a more expensive one. If the most expensive one is failing, then give up refinement entirely, and return failure. Here is the heuristic I have built over many months of trial and error.

- When using recomputed RUS, it is safe to switch to constant RUS when the geometric running average improvement becomes <0.1. At this time the current orbit and all its statistics must be saved in case constant RUS refinement fails.
- When using constant RUS, it is necessary to
switch back to recomputed RUS when the geometric running average improvement
stays >0.5 for 3 successive refinements. At this time I discard
all progress made by constant RUS and revert to the previous orbit
that used recomputed RUS. I think it is reasonable to discard all progress
made by constant RUS, even if significant progress was made, for the following
reasons:
- A refinement that computes the RUS is asymptotically more expensive than one that does not, so discarding all progress made by constant RUS refinements makes little difference, percentage-wise, in the final run-time.
- Future constant RUS refinements, that will be performed after the RUS is recomputed, will converge faster than the current ones that just failed, because the RUS will be more accurate.
- Recomputing the RUS with small 1-step errors would mean (using the simple ``3 extra digits of accuracy'' heuristic) recomputing it at a much higher accuracy, at much more expense, than is necessary.

- Finally, after having switched back to recomputed RUS, it is time to exit if the geometric running average improvement stays >0.1 for 3 successive refinements; and it is safe to switch again to constant RUS when the geometric running average improvement becomes less than 0.1. I have found that there are usually no half-measures when using recomputed RUS -- refinement either succeeds geometrically or fails miserably.

Further research in recomputed RUS could focus on whether there exists a
criterion to decide that a particular RUS needs recomputing, rather
than all of them. If this is the case, it could be an *O*(*S*) speed savings.
Also, there may be a place for SLES or other noise-reduction
procedures in the loop of expensive and cheap refinement algorithms (which
currently contains only the two above: recomputed RUS and constant RUS).

Finally, I note in passing that Newton's method is sometimes used with dampening when it is known that the correction factors may not be accurate, but are assumed to point roughly in the correct direction. In other words, equation 1.10 is modified to

for some constant . I briefly tried this with the correction co-efficients , but it did not seem to work. However, this result should not be taken as conclusive, as I did not spend much time testing damped correction or trying to make it work.

This idea is based on the same observation as constant RUS, and is
only useful if constant RUS is employed. It takes advantage of the
RUS for a noisy orbit *B* being near to the RUS for a nearby noisy
orbit *A*.
In particular, the shadows for two overlapping segments of a noisy
orbit will be near to each other along the segment that the noisy
segments overlap, and thus the RUS's for the overlapping segment
will also be near each other. So, when trying to find a shadow for
a segment *B* which is an extension to segment *A*,
the RUS of *A* can be re-used on the segment of *B* that overlaps *A*.

One interesting question is whether to use the first or last RUS that
was computed for *A*, if *A* ever had to recompute the RUS. An argument
in favour of using the first is that the noisy orbit is exactly the
same, since *B* is just an extension of *A*. However, if *A* had to
recompute the RUS, then probably so will *B* if *A*'s first RUS
is used. Recall that the RUS needed for the early refinements need not
be as accurate as the ones used later. Furthermore, we assume that the segment
of *B*'s shadow that overlaps *A*'s shadow will be *closer* to
*A*'s shadow than the first few iterations of *B* is to the
first few iterations of *A*.
Thus, if we use the last RUS that *A* used, we are less likely to
recompute the RUS for *B*. If *B* is a *subset* of
*A* rather than an extension, then obviously no new RUS 's need be
computed. All shadowing runs in this thesis use the last RUS.

Finally, re-using the RUS allows a more fine-tuned search for where a glitch occurs, because it is much less expensive to attempt shadowing particular subsets of a noisy orbit once the entire RUS is computed.

A numerical solution of an ODE is represented by an ordered, discrete
sequence of points. Say we were to construct a continuous solution *p*(*t*)
from these points, for example by
using a suitably smooth and accurate interpolation. Then we could extend
the definition of shadowing for some true solution *x*(*t*)
by saying: *x*(*t*) -shadows *p*(*t*) on if
.

Now, it should be clear that we can choose *any* representative
sequence of points along *p*(*t*) to use in the refinement algorithm
introduced in the previous chapter; we need not choose the points
from which *p*(*t*) was originally constructed. In particular, we can
choose a subset of the original set of points, as long as enough points
are chosen to be ``representative'' of *p*(*t*). The steps that are finally
chosen are called *shadow steps*.

There are at least two reasons to desire as few shadow steps as possible: First, if constant RUS is being used, then the RUS needs to be stored in memory. Each RUS requires a matrix (the resolvent), and two sets of basis vectors. Clearly the fewer shadow steps, the less memory is required to store the RUS. Second, if the ``accurate'' integrator has a large startup cost, then the fewer startups that are required, the better. For example, the Adams's methods I used take extremely small internal timesteps early in the integration. Thus, for each shadow step, the Adams's method must restart with extremely small stepsizes.

For the *N*-body simulations reported in this thesis, a set of
``standardized'' units was used [14] such that the scale
of the system in all units of interest was of order 1 --
*i.e., * the system had a diameter of order
1, the crossing time was of order 1, and the *N* particles each had
mass 1/*N*. In such a system, the timesteps of QT's integrator averaged
about 0.01, and they used each timestep as a shadow step. I have
found empirically that, in this system, using shadow steps of size 0.1
works well. Smaller shadow steps use more time and memory but could not
find shadows any better. Shadow
steps of 0.2 were slightly less reliable, and steps of size 0.5
were unreliable.

The important criterion for choosing shadow step sizes seems to be
the validity of the linear map. Recall that the linear map or *
resolvent* is a
first-order approximation that maps small perturbations of the
orbit *p*(*t*) from time to , so that a system
starting at evolves, to first order in ,
to .
For this equation to be valid, must be small.
Since the computation
of the correction co-efficients in the GHYS refinement procedure depends
on this linear perturbation approximation,
the 1-step errors of our shadow steps must be
sufficiently small for the linear map to be valid. The larger the shadow
steps, the further the accurate solution will diverge from the noisy
solution, and thus the larger the correction needed per shadow step.
Although the allowable perturbation size is problem-dependent,
I have found empirically that, for the unit-sized *N*-body system used
here, perturbations of size are always small enough, while
perturbations of size about or greater start to show significant
difficulties. In general, however, a shadow step must be small enough
that the 1-step errors are small enough that
the linearized map is accurate enough that, when the correction is
applied at step *i*, the linearized map correctly maps the error
towards being smaller. In other words, if is the 1-step error
at , is the correction,
is the linear map and
is the exact perturbation map, then must be close to
in comparison to the size of the 1-step errors:

In practice, it is probably too difficult to use equation 2.2, and it may be necessary to experiment to find the largest allowable 1-step error, which then must be used to constrain the size of the shadow steps.

If the Lyapunov exponent of the system is known *a priori*, it may
be possible to choose shadow step sizes *a priori*. Note that
the divergence of the noisy and accurate orbits will be dominated by
the errors in the noisy orbit.
Say the noisy orbit has stepsizes of negligible length,
each with error .
Then each error will be magnified after a time T to .
If *E* is the maximum allowable 1-step error for which the linear map
is valid, then shadow steps should be no larger than

For our *N*-body systems, this equation gives an upper bound of about
1 to 10 time units, which is larger than the 0.1 sized shadow
steps I used. Thus, the above equation seems a weak upper bound.

Even a system with a particular Lyapunov exponent may have short
intervals during which the local divergence rate is faster or
slower than the Lyapunov exponent would suggest. Thus, it may
be helpful to devise an algorithm that dynamically builds shadow steps.
One such algorithm, as yet untested, is as follows.
Assume that the timesteps in the noisy orbit are small in comparison
to the shadow steps that will be built.
Let the noisy orbit be at times .
Let *E* be the largest allowable 1-step error for which the linear map is
valid.
To construct a shadow step starting at time ,
the accurate solution *p*(*t*) is initialized to
the noisy point .
Accurate integration of *p*(*t*) proceeds successively
to times until , when
marks the end of the shadow step.
The accurate solution is then re-initialized to to begin the
next shadow step.

Now, the first refinement is likely to have the largest 1-step errors; thus, the first refinement will need the smallest shadow steps, and thus the largest RUS set of any refinement. This means that a large RUS set needs to be computed at least once, but only to low accuracy (see the section on ``cheaper accurate integrator'').

Originally I thought that, as refinement proceeds to decrease the 1-step
errors, it would be
advantageous to construct larger-and-larger shadow steps.
However, I no longer believe this will
give much of a speed savings, because (1) If refinement is working well,
then constant RUS will be invoked, in which case there is no need to
recompute the RUS at all; (2) Conversely, if refinement is failing,
then there is no justification to increase the size of shadow steps -- in
fact, it may be advantageous to decrease them. Note that
if refinement *is* succeeding, it may be advantageous to use
larger-and-larger
shadow steps in the computation of the 1-step errors, even though
the RUS is not being recomputed.

Finally, implementing dynamic shadow step sizes may help in another area. As described above, I currently discard all progress made by constant RUS if it begins to fail, reverting to the orbit and RUS of the most recent refinement that computed the RUS. However, if constant RUS works for several refinements but then starts to fail, perhaps it would be advantageous to use the orbit of the most recent successful refinement to build new shadow steps using the above dynamic algorithm, regardless of whether the most recent successful refinement computed the RUS or not.

When computing the 1-step errors of a noisy orbit, we must
use an integrator that has smaller 1-step errors than the
integrator used to compute the noisy orbit. The question is,
how much more accurate does it need to be? During early
refinements when the 1-step errors are large, the errors in
the correction factors may be dominated by the first-order
approximation of the resolvents, so the 1-step errors need not
be computed to machine precision. In practice, I have found that
it is sufficient to compute the 1-step errors to an accuracy
that of the size of the maximum 1-step error of the previous
refinement; computing them any more accurately does not speed refinement,
although computing them only more accurately
sometimes results in more iterations of the refinement procedure.
In addition, a factor of only is not enough because
some integrators have sufficiently gross control of their errors
that requesting only 2 extra digits will result in the integrator
returning *exactly* the same result.
Note it is necessary to loosen this criterion when the 1-step errors
are within of the machine precision, in order not to
request more accuracy than the machine precision allows.

The accuracy required to compute the resolvents is less clear. So far, when reverting to recomputed RUS, I have been computing them to the same accuracy as the 1-step errors, described in the previous paragraph. This may be more precision than is necessary, if, for example, the resolvents change more quickly owing to movement of the orbit than to errors in their construction. In other words, it may be sufficient to always compute a resolvent to some constant precision, and only recompute it if the numerical shadow drifts sufficiently far from the original noisy orbit that the resolvent no longer correctly maps small perturbations from one shadow step to the next.

The only methods I have tried as my accurate integrator are Adams's methods. It may be wise to try others. In particular, if the shadow steps are small, it may pay to use a less sophisticated routine, such as a high-order explicit Runge-Kutta method. If the shadow steps are large, an Adams's or a Bulirsch-Stoer method is probably apt, because even though they both have high startup cost, they can eventually move quickly along the solution if they are not forced to restart too often. It may also be interesting to attempt using extended precision integration to test the accuracy of the standard (double precision) routines.

Finally, an important factor in problems, like the *N*-body problem,
that have a large variance of scales all occuring at once (*i.e., * some
pairs of stars are several orders of magnitude closer to each other
than other pairs)
is a concept called *individual time steps*. The modern hand-coded
*N*-body integrator gives each star a personalized integration timestep;
stars that have high accelerations are integrated with smaller timesteps
than those with lower accelerations.
Perhaps it would be fruitful to attempt to write a general-purpose
integration routine that uses individual timesteps for each dimension.
However, such an integrator is beyond the scope of this thesis.

To quantify the speedup of the optimizations, 32 noisy orbits
were chosen, and each was shadowed using several different combinations of the
optimizations. In each case, the system consisted of 99 fixed particles
and one moving particle (*i.e., * *N*=100,*M*=1),
identical to the shadowing experiments of QT.
Forces were not softened.
The 32 orbits were chosen by generating random 3-dimensional positions for
all particles from the uniform distribution on (0,1);
a 3-dimensional random initial velocity for the moving particle was
also chosen uniformly on (0,1).
The standardized units of Heggie and Mathieu
[14] were used, in which each particle has mass 1/*N*.
The pseudo-random number generator was the popular 48-bit *drand48()*,
with seeds 1 through 32. The seed 1 case was run on a Sun SPARCstation
10/514 with a 50 MHz clock; seed cases 2 through 32 were run on 25 MHz
Sun SPARCstation IPC's (each about 1/8 the speed of the SS10/514).

Once the initial conditions were set, each noisy orbit was generated by integrating for 1.28 standard time units (about 1 crossing time) using LSODE [16] with pure relative error control of . This figure agreed well with the magnitude of the initial 1-step errors computed during refinement. Although one crossing time sounds short, it is long enough that 5 of the orbits contained trouble spots. Furthermore, the number of unoptimized QT refinements required to find a shadow that has no trouble spots seems independent of the length of the orbit -- each refinement takes much longer, but the convergence is still geometric per refinement. Thus, the results below should be independent of the length of the orbit, as long as the length is non-trivial. This is what has been observed with the longer orbits I have shadowed, although I do not document them here.

For short shadow steps, a constant shadow step size of 0.01 was used, which approximates the average sized shadow step in QT. This results in 128 shadow steps. For large shadow steps, 16 shadow steps of size 0.08 were used. As in a usual shadowing attempt, longer and longer segments are shadowed in an attempt to find the longest shadow. Here, each successive segment had twice as many shadow steps as the previous one, up to 16 and 128, for long and short shadow steps, respectively. No attempt was made to isolate glitches more accurately than this factor of two.

The highest tolerance requested of the accurate integrator was . A successful numerical shadow was defined to be one whose 1-step errors were all less than . This number was chosen simply because it was the smallest number that geometrically converging refinement sequences could consistently achieve; was too small, because many refinement sequences would proceed geometrically until their maximum 1-step error was about , and then they would ``bounce around'' for many refinements until, apparently by chance, the maximum 1-step error would fall below .

The maximum allowed shadow distance was 0.1, although none over 0.0066 were observed with successful shadows.

The following speedup table displays the speedups of the various optimizations. All ratios are speedup factors with respect to the time column.

time(s): | time in seconds for unoptimize version. |

QT: | original code of Quinlan & Tremaine |

L: | L arge shadow steps |

I: | backward resolvent by Inverting forward resolvent |

C: | Cheaper accurate integrator |

R: | constant RUS |

U: | re-Use RUS from previous successful shadow (appears only in combinations) |

seed | time(s) | QT | L | I | C | R | CR | CUR | CLR | CULR | CULRI |
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 1442 | 1.06 | 4.54 | 2.01 | 1.96 | 3.00 | 8.87 | 14.2 | 30.8 | 37.2 | 47.4 |

2 | 9379 | 0.707 | 6.08 | 1.82 | 3.09 | 3.48 | 11.2 | 15.7 | 52.3 | 68.3 | 75.7 |

3 | 7847 | 0.728 | 6.30 | 1.88 | 2.36 | 2.56 | 9.29 | 13.0 | 44.0 | 59.7 | 65.5 |

4 | 7011 | 0.866 | 11.1 | 1.98 | 2.74 | 2.84 | 7.88 | 12.3 | 40.5 | 55.8 | 74.3 |

5 | 12204 | 0.933 | 12.2 | 2.13 | 2.42 | 1.08 | 1.66 | 1.54 | 5.31 | 4.96 | 8.0 |

6 | 9443 | 1.14 | 7.24 | 2.01 | 3.18 | 3.98 | 10.8 | 13.5 | 5.36 | 56.8 | 29.8 |

7 | 6277 | 1.13 | 5.51 | 1.85 | 2.63 | 2.71 | 8.50 | 11.6 | 46.9 | 63.7 | 82.9 |

8 | 6582 | 1.13 | 4.79 | 2.04 | 1.91 | 2.79 | 7.09 | 10.6 | 33.8 | 45.4 | 56.9 |

9 | 7154 | 1.14 | 4.33 | 1.92 | 1.96 | 2.70 | 7.74 | 11.5 | 30.5 | 39.6 | 51.9 |

10 | 5832 | 1.13 | 4.32 | 1.87 | 1.54 | 2.72 | 6.46 | 10.7 | 29.1 | 45.5 | 57.2 |

11 | 6659 | 1.13 | 5.10 | 1.93 | 2.34 | 2.65 | 7.40 | 10.7 | 37.1 | 50.7 | 62.9 |

12 | 8623 | 0.861 | 5.00 | 2.44 | 2.57 | 3.37 | 9.64 | 13.7 | 44.8 | 52.4 | 75.4 |

13 | 14566 | 0.452 | 4.34 | 1.00 | 4.52 | 0.784 | 0.931 | 0.939 | 5.67 | 5.15 | 7.6 |

14 | 6976 | 0.749 | 4.54 | 2.00 | 2.17 | 2.85 | 7.23 | 11.1 | 32.9 | 45.2 | 49.4 |

15 | 5664 | 0.729 | 4.79 | 2.02 | 1.89 | 2.71 | 6.83 | 10.2 | 33.5 | 45.4 | 62.9 |

16 | 6634 | 1.15 | 4.89 | 1.97 | 1.82 | 2.64 | 7.35 | 10.6 | 33.8 | 44.8 | 58.7 |

17 | 6854 | 0.946 | 4.96 | 2.06 | 2.16 | 2.88 | 8.97 | 12.7 | 47.2 | 60.8 | 83.2 |

18 | 4932 | 0.629 | 2.88 | 1.74 | 1.69 | 2.17 | 5.38 | 7.84 | 25.7 | 34.1 | 45.7 |

19 | 8470 | 1.06 | 5.16 | 2.98 | 2.96 | 1.32 | 4.37 | 4.02 | 31.1 | 27.5 | 53.5 |

20 | 4566 | 1.12 | 5.59 | 1.91 | 2.04 | 2.78 | 6.21 | 9.50 | 33.7 | 48.6 | 64.3 |

21 | 7409 | 1.07 | 6.01 | 2.00 | 2.92 | 2.86 | 9.59 | 13.4 | 46.6 | 61.3 | 82.2 |

22 | 6988 | 1.22 | 4.89 | 2.17 | 2.00 | 3.03 | 7.34 | 11.4 | 36.1 | 44.0 | 57.2 |

23 | 7699 | 0.750 | 4.66 | 1.93 | 1.91 | 3.20 | 8.27 | 10.4 | 35.2 | 46.3 | 51.1 |

24 | 8851 | 1.18 | 2.53 | 1.79 | 2.78 | 2.89 | 10.2 | 15.4 | 52.9 | 71.1 | 93.1 |

25 | 7287 | 1.13 | 3.58 | 1.96 | 2.67 | 2.53 | 7.58 | 11.5 | 34.1 | 47.4 | 59.3 |

26 | 10736 | 0.854 | 7.74 | 2.84 | 2.79 | 4.33 | 12.4 | 16.9 | 58.2 | 76.5 | 81.1 |

27 | 7001 | 1.13 | 5.07 | 2.01 | 2.24 | 2.79 | 8.35 | 12.1 | 38.9 | 54.5 | 70.1 |

28 | 7324 | 1.13 | 4.49 | 2.06 | 1.93 | 2.72 | 8.01 | 11.2 | 33.2 | 49.3 | 55.0 |

29 | 13779 | 0.802 | 4.01 | 1.89 | 2.01 | 0.898 | 1.16 | 1.15 | 4.03 | 3.99 | 6.5 |

30 | 7932 | 1.14 | 10.9 | 2.01 | 2.17 | 2.81 | 8.82 | 13.0 | 43.3 | 67.5 | 86.7 |

31 | 6989 | 1.20 | 5.57 | 2.26 | 2.37 | 2.81 | 8.44 | 12.3 | 47.4 | 57.2 | 77.6 |

32 | 6720 | 1.08 | 4.80 | 1.89 | 1.90 | 2.78 | 7.61 | 11.5 | 34.0 | 50.5 | 65.5 |

avg | 7682 | 1.07 | 5.56 | 2.01 | 2.36 | 2.68 | 7.55 | 10.8 | 34.6 | 47.5 | 59.3 |

[Note (30 Dec 1996): a more recent technical report has added still more optimizations].

There are several interesting observations to make about this table.
First, note
that the run times of the original QT algorithm are comparable to the
run times of my unoptimized version (differing on average by a factor
of 1.07), as would be expected.
Now looking at each optimization acting alone, we see that
using Large Shadow Steps 8 times longer than QT (column *L*)
gives an average speedup of
about 5.5. The large shadow steps implemented here (every 8 small shadow
steps) was trivial to implement, so I would recommend that,
even if no other optimizations are used, large shadow steps are
worthwhile. Third, inverting the forward resolvent to produce the
inverse resolvent (column *I*) produces the expected average speedup
of 2.0. Fourth, the cheap accurate integrator (column *C*) gives an
average speedup of 2.36. This is about what is expected, because
both the QT and *C* algorithms on average require just over 3 refinements to
converge, and all *C* refinements are cheaper than a QT refinement
except the last one, which is about equal in expense.
Finally, using constant RUS gives a speedup of almost 3.
Again this is about what is expected because QT requires about 3
refinements to converge
while *R* has one RUS computation followed by several cheap constant
RUS refinements.

Next we look at combinations of optimizations. First, it is interesting
to note that combining the cheap accurate integrator *C* with
constant RUS *R* results in an average speedup that is greater than the product of
the two individual speedups ( ). This
is because, when using *CR*, only *one* RUS is computed,
and it is computed cheaply. Using *R* alone requires
computing the one RUS to high accuracy; using *C* alone requires computing
at least one RUS (the final one) to high accuracy. Second, re-Using
the RUS of a previous successful shadow
(*U* or re-Use RUS, seen in column *CUR*)
makes sense only when *R* is also used. It gives a speedup
of about 43% over *CR*. This is less than a factor of 2 because, with
*C*, the resolvents are computed sufficiently cheaply that the time to
do a constant RUS iteration is becoming non-negligible. Presumably
if a *UR* column existed, it would show a speedup over the *R* column
closer to 2. The last three columns show other combinations.
The most important point to note is that, except for *CR*, the combinations
give a speedup less than the product of any appropriate choice of previous
columns. Perhaps this is at least partially owing to an affect similar
to Ahmdal's Law: there still
exist parts of the algorithm that have not been sped up, and as the
remainder of the program that *is* being optimized speeds up, these
unoptimized sections take a greater proportion of the time.

Finally, it will be noticed that orbits 5, 6, 13, 19, and 29 obtained speedups significantly smaller than average in some columns. These are the orbits that either contained glitches or trouble spots. I will deal with these on a case-by-case basis, since there are several interesting points to be made.

Errata InsertionDue to a programming blunder, the gravitational potential energy computation in my shadowing program was incorrect. Thus, figures 2.2, 2.4, and 2.5 (pp. 34,36,37) of my original thesis are incorrect. The paper version of the thesis has an errata sheet. For the Web document, I decided to insert the corrections directly into the document.

The corrected energy fluctuations are more jagged than in the original diagrams, although much smaller in magnitude (at least 4 orders of magnitude smaller!).

**orbit 5**- : As can be seen in Figure 2.1,
this orbit suffered a close encounter at timestep 83 (out of 128).

**Figure 2.1:**Orbit 5 projected onto the*xy*plane. The close encounter is near the right-most extent of the orbit. The ``+'' marks the point in the orbit where the 1-step errors could not be decreased below , and corresponds to the timestep of closest approach. The particle attached to the bottom end of the orbit near (0.52,0.27) is the moving particle at its initial position.

The energy of the real system should be exactly constant. The energy of the simulations is conserved to about 6 digits, so for the energy plots, a constant of about 6 digits was subtracted from the energy to more clearly show the fluctuations.

Figure 2.2 shows that the energy of orbit 5 plunged dramatically at step 83.

**Figure 2.2:**The [errata corrected] total energy (kinetic + gravitational potential, shifted +0.014234) of orbit 5 (unshadowable). The sharp decline at timestep 83 marks the close approach that caused the trouble spot. The drop is about , and the total energy fluctuation is about , or about .

The orbit could be shadowed for its first half, but not over its entire length, because it contained a trouble spot at the close encounter, where the 1-step error could not be decreased below about .

This orbit obtained the greatest speedup when the only active optimization was

*L*-- the final 3 columns show that the algorithm was slower when other optimizations were added. The reason seems to be constant RUS: since the 1-step errors could be reduced reliably down to , but no lower, the algorithm switched many times to constant RUS (at which point the 1-step errors would be decreased to , but no lower), and then back to recomputed RUS, which would run for 1 or 2 refinements, producing refinements better than constant RUS but not geometric; then the algorithm would switch back to constant RUS, again decreasing the 1-step errors to , then switch back to recomputed RUS, and so forth. Since all progress of a sequence of constant RUS refinements is discarded, progress was slow, until, finally, recomputed RUS refinements got down to , at which point the algorithm conceded defeat. However, other columns still make sense: for example, comparing column*CULR*to*CULRI*shows a speedup of almost 2, which is expected with the*I*optimization. **orbit 6**- : This orbit also suffered a close encounter, near
timestep 87. The total energy of the system has a sharp
dip at that time, similar to that of orbit 5.
The refinement algorithm came excruciatingly close to finding a numerical shadow for this orbit, at one point achieving a maximum 1-step error of at timestep 87. Columns

*L*,*C*,*R*, and*CR*show significant speedups, but the combination*CLR*has an large drop in speedup, suffering the same alternation between constant RUS and recomputed RUS as did orbit 5. Perhaps the Large Shadow Step at that point was just slightly too large, thus having a linear map that was invalid. A similar fate befell the column*CULRI*, which is surprising considering that only half as many resolvents as column*CULR*were computed per recomputed RUS refinement. **orbit 13**- : This particle suffered two close encounters,
both in the latter half of the orbit. The
energy had a large dip at each of the close encounters.
Shadowing was easily successful for all attempts that only included the first half of the orbit. However, the attempt to shadow the entire orbit was fraught with problems trying to satisfy the extreme sensitivity of the orbit to two close encounters. The second encounter was sufficiently close that the LSODE integrator in the non-optimized version suffered a fatal convergence error on the 4th refinement, trying to integrate the backwards resolvent past the encounter. This explains why the

*I*column shows no speedup: it did not integrate any resolvents backwards, and by chance the forward resolvents could be integrated well enough that it tried more than twice as many refinement iterations than the unoptimized version before giving up. The*R*version took longer because of time spent alternating between constant and recomputed RUS. **orbit 19**- : This orbit was
shadowable for the first half, but not when the second half was included.
However, as can be seen in Figure 2.3,
this particle suffered no close encounters.

**Figure 2.3:**Orbit 19. Note there are no close encounters. It is not clear why this orbit was not shadowable.

The energy fluctuations (Figure 2.4) are about 1/3 of those of orbit 5, although they are just as jagged.

**Figure 2.4:**[Errata corrected] energy of orbit 19, shifted +0.01244 (unshadowable). The first jagged drop in the energy curve near timestep 96 is about , while the total fluctuation is about , about 1/3 that of orbit 5.

Figure 2.5 shows the energy of orbit 22, that

*was*shadowable. Furthermore, I looked at all 32 energy figures and almost all of them had fluctuations similar to the ones shown here, even though most of the 32 orbits were shadowable over the 1.28 time interval used. Thus it seems that energy fluctuations may not, in themselves, be enough to detect non-shadowability.

**Figure 2.5:**Corrected energy of orbit 22, shifted by 0.015151. This orbit*was*shadowable. The fluctuations near the beginning of the orbit are about . The energy fluctuations are slightly bigger (percentage-wise) than orbit 19, and just as jagged. Thus, it seems energy fluctuations alone cannot explain why this orbit was shadowable while 5 and 19 were not.

The largest 1-step errors were consistently at or near time 0. Since the entire orbit appeared extremely smooth, it is not understood why no shadow could be found over the entire length of the noisy orbit.

Finally, referring to the speedup table, most of the speedups appear reasonable.

**orbit 29**- : This particle suffered a close encounter at timestep
53 (out of 128). The first 32 shadow steps were shadowable, but the
first 64 were not, and neither was the orbit as a whole. There are
no surprises in the speedup table; this orbit simply has a glitch at
timestep 53.

There are many other observations about the shadowing attempts that cannot be seen in the table.

- For any given seed, the various optimized versions generally produced shadows agreeing to 12 or 13 decimal places. (For large shadow steps, of course, only the points that appeared on the large shadow steps -- every 8 small steps -- could be verified.) However, when computing shadows of short orbits, the number of agreeing decimal places was much smaller, only 5 or 6 in some cases. I do not believe this is a problem, because with a short orbit, the neighborhood of points that stay near to each other under evolution of the system is large. Furthermore, the stable and unstable subspaces (recall, computed with arbitrary -- but in these cases equal -- subspaces at the endpoints) will be slightly different, because of the resolvents having different numerical errors. As the orbit gets longer, the neighborhood of initial conditions that stay close together gets much smaller, and so any algorithm that computes a shadow must compute ones that agree to many decimal places.
- When using constant RUS, the number of iterations required to
find a shadow is always greater by a small factor (usually 2 or 3) than
not using constant RUS.
However, this is far offset by the speed of a constant RUS
iteration, which took an
average time of 11.4 seconds, while a recompute RUS iteration averaged
157 seconds -- 14 times longer. Since computing a resolvent takes
time, and computing a 1-step error takes only time, this
speedup is asymptotically
*O*(*M*). - The various optimizations seem to decrease reliability slightly. Although there was never a case (in this table) where an optimized version found a shadow that the non-optimized version failed to find, there were a few cases vise versa. No repeatable pattern was observed in these failures, so the reason is still a mystery.

Sun Dec 29 23:43:59 EST 1996

Access count (updated once a day) since 1 Jan 1997: 10584