Next: Reliability of refinement Up: High Dimensional Shadowing & Previous: Introduction

The Optimizations

Brief Overview

There are 6 major optimizations that I have applied to the GHYS/QT refinement procedure. The first two are trivial, and no further details will be given about them. The final four will be explained in more detail in the remaining sections of this chapter.

Definition If program A has a speedup of over program B, then the execution time of A is a fraction of B's time; in other words A runs times faster than B.

The 3M oversight.

QT re-computed a new resolvent for each stable and unstable vector, i.e., at each timestep they computed 6M resolvents. However, a resolvent is a matrix operator that applies to all small perturbations of the orbit from time to . Thus only 2 resolvents per timestep need be computed: one for evolving perturbations forward in time, and another to evolve them backwards. Since most of the time is spent computing the resolvents, fixing this oversight provides an immediate speedup of a factor of about 3M.

Compute backwards resolvent by inverting forward resolvent.

The above optimization can be extended even further by noting that the resolvent matrix that maps perturbations from timestep i to i+1 is precisely the inverse matrix of the resolvent that maps small perturbations from timestep i+1 to i. Thus, rather than doing an integration from timestep i+1 backwards to time i to compute , can be inverted using matrix operations, or the system can be solved, whenever necessary, using standard elimination schemes. This gives an additional speedup of about 2, since only one resolvent per timestep need explicitly be computed using integration of the variational equations.

Large shadow steps.

Define a shadow step as a time interval across which the 1-step error and correction co-efficient are computed. QT used every internal timestep of their integrator's noisy orbit as a shadow step, but this is not necessary. It is reasonable to skip timesteps in the original orbit to build larger shadow steps. This means that fewer resolvents and stable/unstable directions need to be computed and stored.

Cheaper ``accurate'' integrator.

When computing the resolvent and 1-step errors during early refinements, it is not necessary to compute them to extremely high accuracy. Since the initial 1-step errors may have magnitudes of about , the resolvents and 1-step errors need only be computed to a tolerance of, say, , i.e., 3 extra digits. QT always computed the 1-step errors and resolvents to a tolerance of .

Constant resolvent/unstable/stable directions.

If a shadow exists, then by definition it cannot be too far away from the original noisy orbit. It is reasonable to assume that the resolvents, unstable and stable directions (abbreviated RUS) will change little as the orbit is perturbed towards the shadow. Thus, it may not be necessary to recompute the RUS for every refinement iteration. This is probably the biggest savings, because in combination with the cheaper integrator, it means that the RUS usually need only be computed once to a tolerance of about , and this RUS will suffice for all future refinements. This is analogous to a simplified Newton's method that does not recompute the Jacobian at ever iteration.

Re-use RUS from a previous successful shadow.

Recall that to find the longest possible shadow of a noisy orbit, we attempt to shadow for longer and longer times until shadowing fails. Assume shadowing for S shadow steps produces a shadow A. Then attempt shadowing for S+k timesteps for some integer k, and assume a successful shadow B will be found. Since A and the first S steps of B both shadow the same noisy orbit, they must be close to one another. By the same argument as the previous paragraph, the RUS that was computed for A can probably be re-used for the first S shadow steps of B. Thus only k new RUS 's need be computed.

Constant resolvent, unstable and stable directions

I use the acronym RUS to mean the set of all Resolvents, Unstable, and Stable vectors for the entire trajectory. The RUS at a particular timestep i is referred to as RUS .

For a smooth, continuous, well-behaved function, the Jacobian (i.e., the derivative) is also smooth and continuous. Most Hamiltonian systems (that I have seen) studied by scientists are at least piecewise of this sort. The N-body problem, in particular, is of this sort, as long as no particles pass within 0 distance of each other. This implies that the resolvent along a particular path will not change much if the path is perturbed slightly. Since the unstable and stable unit vectors are derived solely from the resolvents, they will also change only slightly when the path is perturbed. Thus, the RUS computed on the first refinement iteration should be re-usable on later iterations, and computing the new 1-step errors is the only significant work to be performed on each new refinement iteration. (Computing the corrections from the errors is much less CPU intensive than computing the errors themselves.)

I have empirically found that computing the RUS to a tolerance of between and usually suffices to refine the orbit down to 1-step errors near the machine precision. If refinement fails using constant RUS, then perhaps there is a spot in the orbit where some RUS 's change quickly with small perturbations in the path. In that case, the RUS can be re-computed for a few refinements, after which constant RUS may be tried again. However, a case in which constant RUS fails often suggests that there as a mild trouble spot somewhere on the orbit that may cause a glitch in longer shadows.

Formulae for switching between constant and recomputed RUS

The algorithm used to switch between constant RUS and recomputed RUS uses a running-average ``memory'' scheme to keep track of the progress of the magnitude of the maximum 1-step error over the previous few refinements. Originally, I used a simple algorithm (like QT's) that would signal failure when a certain number K of successive refinements had failed, for and K=3. (See here for the definition of a successful refinement.) However, refinement would sometimes get into a loop in which there would be a few successful refinements, followed by one or more that would ``undo'' the improvements of the previous ones. For example, the largest 1-step error may cycle as . In general, cases were seen in which the number of refinements in these loops was 3, 4, and up to about 8. Clearly a simple ``die when K successive failures are seen'' is not general enough to catch this kind of loop.

Queuing theory provides a formula that is useful here. It defines an arithmetic running average A to be

where is the newest element added to the set, and is the memory constant. The higher the memory constant, the longer the memory -- i.e., the less the effect of an individual new element. A rule of thumb is that A is roughly an average of the most recent elements. Equation 2.0, however, is not suited for measuring errors such as those in refinement that can change by many orders of magnitude, and is especially unsuited when smaller means better. For example, a string of errors of followed by an error of would average to about , whereas a change from to is an indication that refinement is succeeding. We thus need a geometric equivalent of equation 2.0, to allow values differing by orders of magnitude to average meaningfully. I define the geometric running average G as

where is the new element, and n is an integer, , so higher n implies longer memory. Both equations 2.0 and 2.1 require initialization to some reasonable starting value and , respectively. n should not be too large; I found 2 or 3 worked best.

Finally, we want an equation that measures the improvement that is being made by successive refinements, rather than one that measures absolute error, because it is the change in 1-step errors that indicates whether current refinements are succeeding, not their absolute size. Note that the 1-step errors may stop decreasing for two reasons: (1) 1-step errors have reached the machine precision, or (2) refinement is failing to find a shadow. Using the improvement rather than the absolute value allows us to use the same algorithm to halt refinement in either case. Thus define the improvement I to be

If I is close to or greater than 1, then the current refinement did not improve on the previous one; if I << 1 then the current refinement is successful.

Heuristic for switching between constant and recomputed RUS

We have two refinement methods. In order of decreasing cost, they are recomputed RUS, and constant RUS. The general idea is: if refinement is working well with an expensive method, then switch to a cheaper one; if a cheaper one seems to be failing, switch to a more expensive one. If the most expensive one is failing, then give up refinement entirely, and return failure. Here is the heuristic I have built over many months of trial and error.

When using recomputed RUS, it is safe to switch to constant RUS when the geometric running average improvement becomes <0.1. At this time the current orbit and all its statistics must be saved in case constant RUS refinement fails.
When using constant RUS, it is necessary to switch back to recomputed RUS when the geometric running average improvement stays >0.5 for 3 successive refinements. At this time I discard all progress made by constant RUS and revert to the previous orbit that used recomputed RUS. I think it is reasonable to discard all progress made by constant RUS, even if significant progress was made, for the following reasons:
- A refinement that computes the RUS is asymptotically more expensive than one that does not, so discarding all progress made by constant RUS refinements makes little difference, percentage-wise, in the final run-time.
- Future constant RUS refinements, that will be performed after the RUS is recomputed, will converge faster than the current ones that just failed, because the RUS will be more accurate.
- Recomputing the RUS with small 1-step errors would mean (using the simple ``3 extra digits of accuracy'' heuristic) recomputing it at a much higher accuracy, at much more expense, than is necessary.
This argument is not always applicable; I have seen cases in which discarding constant RUS progress appeared to waste good progress. Perhaps there exists a better heuristic.
Finally, after having switched back to recomputed RUS, it is time to exit if the geometric running average improvement stays >0.1 for 3 successive refinements; and it is safe to switch again to constant RUS when the geometric running average improvement becomes less than 0.1. I have found that there are usually no half-measures when using recomputed RUS -- refinement either succeeds geometrically or fails miserably.

Further research in recomputed RUS could focus on whether there exists a criterion to decide that a particular RUS needs recomputing, rather than all of them. If this is the case, it could be an O(S) speed savings. Also, there may be a place for SLES or other noise-reduction procedures in the loop of expensive and cheap refinement algorithms (which currently contains only the two above: recomputed RUS and constant RUS).

Finally, I note in passing that Newton's method is sometimes used with dampening when it is known that the correction factors may not be accurate, but are assumed to point roughly in the correct direction. In other words, equation 1.10 is modified to

for some constant . I briefly tried this with the correction co-efficients , but it did not seem to work. However, this result should not be taken as conclusive, as I did not spend much time testing damped correction or trying to make it work.

Re-use RUS from a previous successful shadow

This idea is based on the same observation as constant RUS, and is only useful if constant RUS is employed. It takes advantage of the RUS for a noisy orbit B being near to the RUS for a nearby noisy orbit A. In particular, the shadows for two overlapping segments of a noisy orbit will be near to each other along the segment that the noisy segments overlap, and thus the RUS's for the overlapping segment will also be near each other. So, when trying to find a shadow for a segment B which is an extension to segment A, the RUS of A can be re-used on the segment of B that overlaps A.

One interesting question is whether to use the first or last RUS that was computed for A, if A ever had to recompute the RUS. An argument in favour of using the first is that the noisy orbit is exactly the same, since B is just an extension of A. However, if A had to recompute the RUS, then probably so will B if A's first RUS is used. Recall that the RUS needed for the early refinements need not be as accurate as the ones used later. Furthermore, we assume that the segment of B's shadow that overlaps A's shadow will be closer to A's shadow than the first few iterations of B is to the first few iterations of A. Thus, if we use the last RUS that A used, we are less likely to recompute the RUS for B. If B is a subset of A rather than an extension, then obviously no new RUS 's need be computed. All shadowing runs in this thesis use the last RUS.

Finally, re-using the RUS allows a more fine-tuned search for where a glitch occurs, because it is much less expensive to attempt shadowing particular subsets of a noisy orbit once the entire RUS is computed.

Large Shadow Steps

Rationale

A numerical solution of an ODE is represented by an ordered, discrete sequence of points. Say we were to construct a continuous solution p(t) from these points, for example by using a suitably smooth and accurate interpolation. Then we could extend the definition of shadowing for some true solution x(t) by saying: x(t) -shadows p(t) on if .

Now, it should be clear that we can choose any representative sequence of points along p(t) to use in the refinement algorithm introduced in the previous chapter; we need not choose the points from which p(t) was originally constructed. In particular, we can choose a subset of the original set of points, as long as enough points are chosen to be ``representative'' of p(t). The steps that are finally chosen are called shadow steps.

There are at least two reasons to desire as few shadow steps as possible: First, if constant RUS is being used, then the RUS needs to be stored in memory. Each RUS requires a matrix (the resolvent), and two sets of basis vectors. Clearly the fewer shadow steps, the less memory is required to store the RUS. Second, if the ``accurate'' integrator has a large startup cost, then the fewer startups that are required, the better. For example, the Adams's methods I used take extremely small internal timesteps early in the integration. Thus, for each shadow step, the Adams's method must restart with extremely small stepsizes.

Estimating the largest shadow steps that can be used

For the N-body simulations reported in this thesis, a set of ``standardized'' units was used [14] such that the scale of the system in all units of interest was of order 1 -- i.e., the system had a diameter of order 1, the crossing time was of order 1, and the N particles each had mass 1/N. In such a system, the timesteps of QT's integrator averaged about 0.01, and they used each timestep as a shadow step. I have found empirically that, in this system, using shadow steps of size 0.1 works well. Smaller shadow steps use more time and memory but could not find shadows any better. Shadow steps of 0.2 were slightly less reliable, and steps of size 0.5 were unreliable.

The important criterion for choosing shadow step sizes seems to be the validity of the linear map. Recall that the linear map or resolvent is a first-order approximation that maps small perturbations of the orbit p(t) from time to , so that a system starting at evolves, to first order in , to . For this equation to be valid, must be small. Since the computation of the correction co-efficients in the GHYS refinement procedure depends on this linear perturbation approximation, the 1-step errors of our shadow steps must be sufficiently small for the linear map to be valid. The larger the shadow steps, the further the accurate solution will diverge from the noisy solution, and thus the larger the correction needed per shadow step. Although the allowable perturbation size is problem-dependent, I have found empirically that, for the unit-sized N-body system used here, perturbations of size are always small enough, while perturbations of size about or greater start to show significant difficulties. In general, however, a shadow step must be small enough that the 1-step errors are small enough that the linearized map is accurate enough that, when the correction is applied at step i, the linearized map correctly maps the error towards being smaller. In other words, if is the 1-step error at , is the correction, is the linear map and is the exact perturbation map, then must be close to in comparison to the size of the 1-step errors:

In practice, it is probably too difficult to use equation 2.2, and it may be necessary to experiment to find the largest allowable 1-step error, which then must be used to constrain the size of the shadow steps.

If the Lyapunov exponent of the system is known a priori, it may be possible to choose shadow step sizes a priori. Note that the divergence of the noisy and accurate orbits will be dominated by the errors in the noisy orbit. Say the noisy orbit has stepsizes of negligible length, each with error . Then each error will be magnified after a time T to . If E is the maximum allowable 1-step error for which the linear map is valid, then shadow steps should be no larger than

For our N-body systems, this equation gives an upper bound of about 1 to 10 time units, which is larger than the 0.1 sized shadow steps I used. Thus, the above equation seems a weak upper bound.

Even a system with a particular Lyapunov exponent may have short intervals during which the local divergence rate is faster or slower than the Lyapunov exponent would suggest. Thus, it may be helpful to devise an algorithm that dynamically builds shadow steps. One such algorithm, as yet untested, is as follows. Assume that the timesteps in the noisy orbit are small in comparison to the shadow steps that will be built. Let the noisy orbit be at times . Let E be the largest allowable 1-step error for which the linear map is valid. To construct a shadow step starting at time , the accurate solution p(t) is initialized to the noisy point . Accurate integration of p(t) proceeds successively to times until , when marks the end of the shadow step. The accurate solution is then re-initialized to to begin the next shadow step.

Now, the first refinement is likely to have the largest 1-step errors; thus, the first refinement will need the smallest shadow steps, and thus the largest RUS set of any refinement. This means that a large RUS set needs to be computed at least once, but only to low accuracy (see the section on ``cheaper accurate integrator'').

Originally I thought that, as refinement proceeds to decrease the 1-step errors, it would be advantageous to construct larger-and-larger shadow steps. However, I no longer believe this will give much of a speed savings, because (1) If refinement is working well, then constant RUS will be invoked, in which case there is no need to recompute the RUS at all; (2) Conversely, if refinement is failing, then there is no justification to increase the size of shadow steps -- in fact, it may be advantageous to decrease them. Note that if refinement is succeeding, it may be advantageous to use larger-and-larger shadow steps in the computation of the 1-step errors, even though the RUS is not being recomputed.

Finally, implementing dynamic shadow step sizes may help in another area. As described above, I currently discard all progress made by constant RUS if it begins to fail, reverting to the orbit and RUS of the most recent refinement that computed the RUS. However, if constant RUS works for several refinements but then starts to fail, perhaps it would be advantageous to use the orbit of the most recent successful refinement to build new shadow steps using the above dynamic algorithm, regardless of whether the most recent successful refinement computed the RUS or not.

Cheaper accurate integrator

When computing the 1-step errors of a noisy orbit, we must use an integrator that has smaller 1-step errors than the integrator used to compute the noisy orbit. The question is, how much more accurate does it need to be? During early refinements when the 1-step errors are large, the errors in the correction factors may be dominated by the first-order approximation of the resolvents, so the 1-step errors need not be computed to machine precision. In practice, I have found that it is sufficient to compute the 1-step errors to an accuracy that of the size of the maximum 1-step error of the previous refinement; computing them any more accurately does not speed refinement, although computing them only more accurately sometimes results in more iterations of the refinement procedure. In addition, a factor of only is not enough because some integrators have sufficiently gross control of their errors that requesting only 2 extra digits will result in the integrator returning exactly the same result. Note it is necessary to loosen this criterion when the 1-step errors are within of the machine precision, in order not to request more accuracy than the machine precision allows.

The accuracy required to compute the resolvents is less clear. So far, when reverting to recomputed RUS, I have been computing them to the same accuracy as the 1-step errors, described in the previous paragraph. This may be more precision than is necessary, if, for example, the resolvents change more quickly owing to movement of the orbit than to errors in their construction. In other words, it may be sufficient to always compute a resolvent to some constant precision, and only recompute it if the numerical shadow drifts sufficiently far from the original noisy orbit that the resolvent no longer correctly maps small perturbations from one shadow step to the next.

Choosing the accurate integrator

The only methods I have tried as my accurate integrator are Adams's methods. It may be wise to try others. In particular, if the shadow steps are small, it may pay to use a less sophisticated routine, such as a high-order explicit Runge-Kutta method. If the shadow steps are large, an Adams's or a Bulirsch-Stoer method is probably apt, because even though they both have high startup cost, they can eventually move quickly along the solution if they are not forced to restart too often. It may also be interesting to attempt using extended precision integration to test the accuracy of the standard (double precision) routines.

Finally, an important factor in problems, like the N-body problem, that have a large variance of scales all occuring at once (i.e., some pairs of stars are several orders of magnitude closer to each other than other pairs) is a concept called individual time steps. The modern hand-coded N-body integrator gives each star a personalized integration timestep; stars that have high accelerations are integrated with smaller timesteps than those with lower accelerations. Perhaps it would be fruitful to attempt to write a general-purpose integration routine that uses individual timesteps for each dimension. However, such an integrator is beyond the scope of this thesis.

Numerical results of the optimizations

To quantify the speedup of the optimizations, 32 noisy orbits were chosen, and each was shadowed using several different combinations of the optimizations. In each case, the system consisted of 99 fixed particles and one moving particle (i.e., N=100,M=1), identical to the shadowing experiments of QT. Forces were not softened. The 32 orbits were chosen by generating random 3-dimensional positions for all particles from the uniform distribution on (0,1); a 3-dimensional random initial velocity for the moving particle was also chosen uniformly on (0,1). The standardized units of Heggie and Mathieu [14] were used, in which each particle has mass 1/N. The pseudo-random number generator was the popular 48-bit drand48(), with seeds 1 through 32. The seed 1 case was run on a Sun SPARCstation 10/514 with a 50 MHz clock; seed cases 2 through 32 were run on 25 MHz Sun SPARCstation IPC's (each about 1/8 the speed of the SS10/514).

Once the initial conditions were set, each noisy orbit was generated by integrating for 1.28 standard time units (about 1 crossing time) using LSODE [16] with pure relative error control of . This figure agreed well with the magnitude of the initial 1-step errors computed during refinement. Although one crossing time sounds short, it is long enough that 5 of the orbits contained trouble spots. Furthermore, the number of unoptimized QT refinements required to find a shadow that has no trouble spots seems independent of the length of the orbit -- each refinement takes much longer, but the convergence is still geometric per refinement. Thus, the results below should be independent of the length of the orbit, as long as the length is non-trivial. This is what has been observed with the longer orbits I have shadowed, although I do not document them here.

For short shadow steps, a constant shadow step size of 0.01 was used, which approximates the average sized shadow step in QT. This results in 128 shadow steps. For large shadow steps, 16 shadow steps of size 0.08 were used. As in a usual shadowing attempt, longer and longer segments are shadowed in an attempt to find the longest shadow. Here, each successive segment had twice as many shadow steps as the previous one, up to 16 and 128, for long and short shadow steps, respectively. No attempt was made to isolate glitches more accurately than this factor of two.

The highest tolerance requested of the accurate integrator was . A successful numerical shadow was defined to be one whose 1-step errors were all less than . This number was chosen simply because it was the smallest number that geometrically converging refinement sequences could consistently achieve; was too small, because many refinement sequences would proceed geometrically until their maximum 1-step error was about , and then they would ``bounce around'' for many refinements until, apparently by chance, the maximum 1-step error would fall below .

The maximum allowed shadow distance was 0.1, although none over 0.0066 were observed with successful shadows.

The following speedup table displays the speedups of the various optimizations. All ratios are speedup factors with respect to the time column.

LEGEND
time(s): time in seconds for unoptimize version.

QT: original code of Quinlan & Tremaine

L: L arge shadow steps

I: backward resolvent by Inverting forward resolvent

C: Cheaper accurate integrator

R: constant RUS

U: re-Use RUS from previous successful shadow (appears only in combinations)

**LEGEND**
time(s):	time in seconds for unoptimize version.
QT:	original code of Quinlan & Tremaine
L:	L arge shadow steps
I:	backward resolvent by Inverting forward resolvent
C:	Cheaper accurate integrator
R:	constant RUS
U:	re-Use RUS from previous successful shadow (appears only in combinations)

seed time(s) QT L I C R CR CUR CLR CULR CULRI

1 1442 1.06 4.54 2.01 1.96 3.00 8.87 14.2 30.8 37.2 47.4

2 9379 0.707 6.08 1.82 3.09 3.48 11.2 15.7 52.3 68.3 75.7

3 7847 0.728 6.30 1.88 2.36 2.56 9.29 13.0 44.0 59.7 65.5

4 7011 0.866 11.1 1.98 2.74 2.84 7.88 12.3 40.5 55.8 74.3

5 12204 0.933 12.2 2.13 2.42 1.08 1.66 1.54 5.31 4.96 8.0

6 9443 1.14 7.24 2.01 3.18 3.98 10.8 13.5 5.36 56.8 29.8

7 6277 1.13 5.51 1.85 2.63 2.71 8.50 11.6 46.9 63.7 82.9

8 6582 1.13 4.79 2.04 1.91 2.79 7.09 10.6 33.8 45.4 56.9

9 7154 1.14 4.33 1.92 1.96 2.70 7.74 11.5 30.5 39.6 51.9

10 5832 1.13 4.32 1.87 1.54 2.72 6.46 10.7 29.1 45.5 57.2

11 6659 1.13 5.10 1.93 2.34 2.65 7.40 10.7 37.1 50.7 62.9

12 8623 0.861 5.00 2.44 2.57 3.37 9.64 13.7 44.8 52.4 75.4

13 14566 0.452 4.34 1.00 4.52 0.784 0.931 0.939 5.67 5.15 7.6

14 6976 0.749 4.54 2.00 2.17 2.85 7.23 11.1 32.9 45.2 49.4

15 5664 0.729 4.79 2.02 1.89 2.71 6.83 10.2 33.5 45.4 62.9

16 6634 1.15 4.89 1.97 1.82 2.64 7.35 10.6 33.8 44.8 58.7

17 6854 0.946 4.96 2.06 2.16 2.88 8.97 12.7 47.2 60.8 83.2

18 4932 0.629 2.88 1.74 1.69 2.17 5.38 7.84 25.7 34.1 45.7

19 8470 1.06 5.16 2.98 2.96 1.32 4.37 4.02 31.1 27.5 53.5

20 4566 1.12 5.59 1.91 2.04 2.78 6.21 9.50 33.7 48.6 64.3

21 7409 1.07 6.01 2.00 2.92 2.86 9.59 13.4 46.6 61.3 82.2

22 6988 1.22 4.89 2.17 2.00 3.03 7.34 11.4 36.1 44.0 57.2

23 7699 0.750 4.66 1.93 1.91 3.20 8.27 10.4 35.2 46.3 51.1

24 8851 1.18 2.53 1.79 2.78 2.89 10.2 15.4 52.9 71.1 93.1

25 7287 1.13 3.58 1.96 2.67 2.53 7.58 11.5 34.1 47.4 59.3

26 10736 0.854 7.74 2.84 2.79 4.33 12.4 16.9 58.2 76.5 81.1

27 7001 1.13 5.07 2.01 2.24 2.79 8.35 12.1 38.9 54.5 70.1

28 7324 1.13 4.49 2.06 1.93 2.72 8.01 11.2 33.2 49.3 55.0

29 13779 0.802 4.01 1.89 2.01 0.898 1.16 1.15 4.03 3.99 6.5

30 7932 1.14 10.9 2.01 2.17 2.81 8.82 13.0 43.3 67.5 86.7

31 6989 1.20 5.57 2.26 2.37 2.81 8.44 12.3 47.4 57.2 77.6

32 6720 1.08 4.80 1.89 1.90 2.78 7.61 11.5 34.0 50.5 65.5

avg 7682 1.07 5.56 2.01 2.36 2.68 7.55 10.8 34.6 47.5 59.3

seed	time(s)	QT	L	I	C	R	CR	CUR	CLR	CULR	CULRI
1	1442	1.06	4.54	2.01	1.96	3.00	8.87	14.2	30.8	37.2	47.4
2	9379	0.707	6.08	1.82	3.09	3.48	11.2	15.7	52.3	68.3	75.7
3	7847	0.728	6.30	1.88	2.36	2.56	9.29	13.0	44.0	59.7	65.5
4	7011	0.866	11.1	1.98	2.74	2.84	7.88	12.3	40.5	55.8	74.3
5	12204	0.933	12.2	2.13	2.42	1.08	1.66	1.54	5.31	4.96	8.0
6	9443	1.14	7.24	2.01	3.18	3.98	10.8	13.5	5.36	56.8	29.8
7	6277	1.13	5.51	1.85	2.63	2.71	8.50	11.6	46.9	63.7	82.9
8	6582	1.13	4.79	2.04	1.91	2.79	7.09	10.6	33.8	45.4	56.9
9	7154	1.14	4.33	1.92	1.96	2.70	7.74	11.5	30.5	39.6	51.9
10	5832	1.13	4.32	1.87	1.54	2.72	6.46	10.7	29.1	45.5	57.2
11	6659	1.13	5.10	1.93	2.34	2.65	7.40	10.7	37.1	50.7	62.9
12	8623	0.861	5.00	2.44	2.57	3.37	9.64	13.7	44.8	52.4	75.4
13	14566	0.452	4.34	1.00	4.52	0.784	0.931	0.939	5.67	5.15	7.6
14	6976	0.749	4.54	2.00	2.17	2.85	7.23	11.1	32.9	45.2	49.4
15	5664	0.729	4.79	2.02	1.89	2.71	6.83	10.2	33.5	45.4	62.9
16	6634	1.15	4.89	1.97	1.82	2.64	7.35	10.6	33.8	44.8	58.7
17	6854	0.946	4.96	2.06	2.16	2.88	8.97	12.7	47.2	60.8	83.2
18	4932	0.629	2.88	1.74	1.69	2.17	5.38	7.84	25.7	34.1	45.7
19	8470	1.06	5.16	2.98	2.96	1.32	4.37	4.02	31.1	27.5	53.5
20	4566	1.12	5.59	1.91	2.04	2.78	6.21	9.50	33.7	48.6	64.3
21	7409	1.07	6.01	2.00	2.92	2.86	9.59	13.4	46.6	61.3	82.2
22	6988	1.22	4.89	2.17	2.00	3.03	7.34	11.4	36.1	44.0	57.2
23	7699	0.750	4.66	1.93	1.91	3.20	8.27	10.4	35.2	46.3	51.1
24	8851	1.18	2.53	1.79	2.78	2.89	10.2	15.4	52.9	71.1	93.1
25	7287	1.13	3.58	1.96	2.67	2.53	7.58	11.5	34.1	47.4	59.3
26	10736	0.854	7.74	2.84	2.79	4.33	12.4	16.9	58.2	76.5	81.1
27	7001	1.13	5.07	2.01	2.24	2.79	8.35	12.1	38.9	54.5	70.1
28	7324	1.13	4.49	2.06	1.93	2.72	8.01	11.2	33.2	49.3	55.0
29	13779	0.802	4.01	1.89	2.01	0.898	1.16	1.15	4.03	3.99	6.5
30	7932	1.14	10.9	2.01	2.17	2.81	8.82	13.0	43.3	67.5	86.7
31	6989	1.20	5.57	2.26	2.37	2.81	8.44	12.3	47.4	57.2	77.6
32	6720	1.08	4.80	1.89	1.90	2.78	7.61	11.5	34.0	50.5	65.5
avg	7682	1.07	5.56	2.01	2.36	2.68	7.55	10.8	34.6	47.5	59.3

[Note (30 Dec 1996): a more recent technical report has added still more optimizations].

There are several interesting observations to make about this table. First, note that the run times of the original QT algorithm are comparable to the run times of my unoptimized version (differing on average by a factor of 1.07), as would be expected. Now looking at each optimization acting alone, we see that using Large Shadow Steps 8 times longer than QT (column L) gives an average speedup of about 5.5. The large shadow steps implemented here (every 8 small shadow steps) was trivial to implement, so I would recommend that, even if no other optimizations are used, large shadow steps are worthwhile. Third, inverting the forward resolvent to produce the inverse resolvent (column I) produces the expected average speedup of 2.0. Fourth, the cheap accurate integrator (column C) gives an average speedup of 2.36. This is about what is expected, because both the QT and C algorithms on average require just over 3 refinements to converge, and all C refinements are cheaper than a QT refinement except the last one, which is about equal in expense. Finally, using constant RUS gives a speedup of almost 3. Again this is about what is expected because QT requires about 3 refinements to converge while R has one RUS computation followed by several cheap constant RUS refinements.

Next we look at combinations of optimizations. First, it is interesting to note that combining the cheap accurate integrator C with constant RUS R results in an average speedup that is greater than the product of the two individual speedups ( ). This is because, when using CR, only one RUS is computed, and it is computed cheaply. Using R alone requires computing the one RUS to high accuracy; using C alone requires computing at least one RUS (the final one) to high accuracy. Second, re-Using the RUS of a previous successful shadow (U or re-Use RUS, seen in column CUR) makes sense only when R is also used. It gives a speedup of about 43% over CR. This is less than a factor of 2 because, with C, the resolvents are computed sufficiently cheaply that the time to do a constant RUS iteration is becoming non-negligible. Presumably if a UR column existed, it would show a speedup over the R column closer to 2. The last three columns show other combinations. The most important point to note is that, except for CR, the combinations give a speedup less than the product of any appropriate choice of previous columns. Perhaps this is at least partially owing to an affect similar to Ahmdal's Law: there still exist parts of the algorithm that have not been sped up, and as the remainder of the program that is being optimized speeds up, these unoptimized sections take a greater proportion of the time.

Finally, it will be noticed that orbits 5, 6, 13, 19, and 29 obtained speedups significantly smaller than average in some columns. These are the orbits that either contained glitches or trouble spots. I will deal with these on a case-by-case basis, since there are several interesting points to be made.

Errata Insertion Due to a programming blunder, the gravitational potential energy computation in my shadowing program was incorrect. Thus, figures 2.2, 2.4, and 2.5 (pp. 34,36,37) of my original thesis are incorrect. The paper version of the thesis has an errata sheet. For the Web document, I decided to insert the corrections directly into the document.
The corrected energy fluctuations are more jagged than in the original diagrams, although much smaller in magnitude (at least 4 orders of magnitude smaller!).

orbit 5

: As can be seen in Figure 2.1, this orbit suffered a close encounter at timestep 83 (out of 128).

Figure 2.1: Orbit 5 projected onto the xy plane. The close encounter is near the right-most extent of the orbit. The ``+'' marks the point in the orbit where the 1-step errors could not be decreased below , and corresponds to the timestep of closest approach. The particle attached to the bottom end of the orbit near (0.52,0.27) is the moving particle at its initial position.

The energy of the real system should be exactly constant. The energy of the simulations is conserved to about 6 digits, so for the energy plots, a constant of about 6 digits was subtracted from the energy to more clearly show the fluctuations.

Figure 2.2 shows that the energy of orbit 5 plunged dramatically at step 83.

Figure 2.2: The [errata corrected] total energy (kinetic + gravitational potential, shifted +0.014234) of orbit 5 (unshadowable). The sharp decline at timestep 83 marks the close approach that caused the trouble spot. The drop is about , and the total energy fluctuation is about , or about .

The orbit could be shadowed for its first half, but not over its entire length, because it contained a trouble spot at the close encounter, where the 1-step error could not be decreased below about .

This orbit obtained the greatest speedup when the only active optimization was L -- the final 3 columns show that the algorithm was slower when other optimizations were added. The reason seems to be constant RUS: since the 1-step errors could be reduced reliably down to , but no lower, the algorithm switched many times to constant RUS (at which point the 1-step errors would be decreased to , but no lower), and then back to recomputed RUS, which would run for 1 or 2 refinements, producing refinements better than constant RUS but not geometric; then the algorithm would switch back to constant RUS, again decreasing the 1-step errors to , then switch back to recomputed RUS, and so forth. Since all progress of a sequence of constant RUS refinements is discarded, progress was slow, until, finally, recomputed RUS refinements got down to , at which point the algorithm conceded defeat. However, other columns still make sense: for example, comparing column CULR to CULRI shows a speedup of almost 2, which is expected with the I optimization.

orbit 6

: This orbit also suffered a close encounter, near timestep 87. The total energy of the system has a sharp dip at that time, similar to that of orbit 5.

The refinement algorithm came excruciatingly close to finding a numerical shadow for this orbit, at one point achieving a maximum 1-step error of at timestep 87. Columns L, C, R, and CR show significant speedups, but the combination CLR has an large drop in speedup, suffering the same alternation between constant RUS and recomputed RUS as did orbit 5. Perhaps the Large Shadow Step at that point was just slightly too large, thus having a linear map that was invalid. A similar fate befell the column CULRI, which is surprising considering that only half as many resolvents as column CULR were computed per recomputed RUS refinement.

orbit 13

: This particle suffered two close encounters, both in the latter half of the orbit. The energy had a large dip at each of the close encounters.

Shadowing was easily successful for all attempts that only included the first half of the orbit. However, the attempt to shadow the entire orbit was fraught with problems trying to satisfy the extreme sensitivity of the orbit to two close encounters. The second encounter was sufficiently close that the LSODE integrator in the non-optimized version suffered a fatal convergence error on the 4th refinement, trying to integrate the backwards resolvent past the encounter. This explains why the I column shows no speedup: it did not integrate any resolvents backwards, and by chance the forward resolvents could be integrated well enough that it tried more than twice as many refinement iterations than the unoptimized version before giving up. The R version took longer because of time spent alternating between constant and recomputed RUS.

orbit 19

: This orbit was shadowable for the first half, but not when the second half was included. However, as can be seen in Figure 2.3, this particle suffered no close encounters.

Figure 2.3: Orbit 19. Note there are no close encounters. It is not clear why this orbit was not shadowable.

The energy fluctuations (Figure 2.4) are about 1/3 of those of orbit 5, although they are just as jagged.

Figure 2.4: [Errata corrected] energy of orbit 19, shifted +0.01244 (unshadowable). The first jagged drop in the energy curve near timestep 96 is about , while the total fluctuation is about , about 1/3 that of orbit 5.

Figure 2.5 shows the energy of orbit 22, that was shadowable. Furthermore, I looked at all 32 energy figures and almost all of them had fluctuations similar to the ones shown here, even though most of the 32 orbits were shadowable over the 1.28 time interval used. Thus it seems that energy fluctuations may not, in themselves, be enough to detect non-shadowability.

Figure 2.5: Corrected energy of orbit 22, shifted by 0.015151. This orbit was shadowable. The fluctuations near the beginning of the orbit are about . The energy fluctuations are slightly bigger (percentage-wise) than orbit 19, and just as jagged. Thus, it seems energy fluctuations alone cannot explain why this orbit was shadowable while 5 and 19 were not.

The largest 1-step errors were consistently at or near time 0. Since the entire orbit appeared extremely smooth, it is not understood why no shadow could be found over the entire length of the noisy orbit.

Finally, referring to the speedup table, most of the speedups appear reasonable.

orbit 29

: This particle suffered a close encounter at timestep 53 (out of 128). The first 32 shadow steps were shadowable, but the first 64 were not, and neither was the orbit as a whole. There are no surprises in the speedup table; this orbit simply has a glitch at timestep 53.

There are many other observations about the shadowing attempts that cannot be seen in the table.

For any given seed, the various optimized versions generally produced shadows agreeing to 12 or 13 decimal places. (For large shadow steps, of course, only the points that appeared on the large shadow steps -- every 8 small steps -- could be verified.) However, when computing shadows of short orbits, the number of agreeing decimal places was much smaller, only 5 or 6 in some cases. I do not believe this is a problem, because with a short orbit, the neighborhood of points that stay near to each other under evolution of the system is large. Furthermore, the stable and unstable subspaces (recall, computed with arbitrary -- but in these cases equal -- subspaces at the endpoints) will be slightly different, because of the resolvents having different numerical errors. As the orbit gets longer, the neighborhood of initial conditions that stay close together gets much smaller, and so any algorithm that computes a shadow must compute ones that agree to many decimal places.
When using constant RUS, the number of iterations required to find a shadow is always greater by a small factor (usually 2 or 3) than not using constant RUS. However, this is far offset by the speed of a constant RUS iteration, which took an average time of 11.4 seconds, while a recompute RUS iteration averaged 157 seconds -- 14 times longer. Since computing a resolvent takes time, and computing a 1-step error takes only time, this speedup is asymptotically O(M).
The various optimizations seem to decrease reliability slightly. Although there was never a case (in this table) where an optimized version found a shadow that the non-optimized version failed to find, there were a few cases vise versa. No repeatable pattern was observed in these failures, so the reason is still a mystery.

Next: Reliability of refinement Up: High Dimensional Shadowing & Previous: Introduction

Wayne Hayes
Sun Dec 29 23:43:59 EST 1996

Access count (updated once a day) since 1 Jan 1997: 10584