next up previous
Next: A wider perspective: Why Up: Shadowing Previous: The refinement procedure of

A new shadowing procedure: SLES

Neural network backpropagation and error subtraction

Although the following refinement algorithm has absolutely nothing to do with neural networks, the idea for it was inspired by how the neural net backpropagation learning algorithm works. In backpropagation, a function tex2html_wrap_inline3119 measures the error in the output of a neural network, given a certain input a. The neural network ``learns'' by adjusting its internal variables tex2html_wrap_inline3123 to reduce its output error; learning stops when the output error is sufficiently small. The way it reduces the error is by taking partial derivatives of the error function with respect to all the internal variables tex2html_wrap_inline3125 , producing the gradient of the error with respect to tex2html_wrap_inline3127 . Moving tex2html_wrap_inline3127 a small amount in the direction opposite the gradient should then reduce the error. The amount by which tex2html_wrap_inline3127 is moved along the negative gradient is called the learning rate. In one sentence, backpropagation says, ``Find a direction to move that reduces the error, and then move a little bit in that direction.''

The inspiration for shadowing is this: once we have computed the 1-step error vectors tex2html_wrap_inline3133 along the noisy trajectory tex2html_wrap_inline2781 , it seems reasonable that the 1-step error at step i would be reduced if we simply subtracted a fraction tex2html_wrap_inline3139 of tex2html_wrap_inline3133 from tex2html_wrap_inline2781 , for some tex2html_wrap_inline3145 . This is the basis for what I call a Local Error Subtraction method.

SLES: Stable Local Error Subtraction

A naïve implementation of the above idea may be simply to set tex2html_wrap_inline3147 for i from 1 to S. However, this does not define how to change the initial point tex2html_wrap_inline3153 , because there is no tex2html_wrap_inline3155 . Since any new true trajectory must have a different initial condition, we must find a way to correct tex2html_wrap_inline3153 .

To allow corrections to tex2html_wrap_inline3153 , I introduce the backward error,

displaymath3161

where tex2html_wrap_inline3163 integrates the solution backwards from time i+1 to time i, and tex2html_wrap_inline3169 is defined on tex2html_wrap_inline3171 , where S is the number of shadow steps. Thus each timestep from 1 to S-1 has both a forward error tex2html_wrap_inline3133 and backward error tex2html_wrap_inline3169 ; timestep 0 has only a backward error tex2html_wrap_inline3185 , and timestep S has only a forward error tex2html_wrap_inline3189 .

Subtracting some linear combination tex2html_wrap_inline3191 from tex2html_wrap_inline2781 (setting tex2html_wrap_inline3195 ) does not work for any reasonable values of tex2html_wrap_inline3197 that I have tried (between 0.1 and 0.9). I believe this is because 1-step errors computed in the forward direction will be slightly unstable in their components that lie in the unstable subspace, and backwards 1-step errors will be unstable in their components that lie in the stable subspace. The method works, however, if we rewrite tex2html_wrap_inline3133 using equation 1.2 (which was tex2html_wrap_inline3205 ), and tex2html_wrap_inline3207 , and update tex2html_wrap_inline2781 as

  equation485

In other words, we use only the components of the 1-step error that are computationally stable.

The SLESgif method seems to work about as fast as the optimized GHYS method introduced in the next chapter. For example, using tex2html_wrap_inline3211 , 1-step errors are generally reduced by about a factor of 10 per refinement iteration, as one would expect when subtracting away 90% of the error. However, I have not extensively tested it, and since it seems to have a weaker theoretical basis than the GHYS method, I have not used it in any of the results mentioned in later chapters. However, I have no reason to believe that the method is unreliable, so further testing in the future seems warranted.

Finally, note that SLES makes no explicit use of the linear map tex2html_wrap_inline2765 . Currently, the only use of the resolvents in SLES is implicitly through the construction of the stable and unstable directions. If a method of computing the stable and unstable directions can be found that doesn't use the linear maps (eg.,  eigenvector decomposition as mentioned above), then the SLES method will not use the linear maps at all, and they will not need to be computed. Since computing the linear maps (for an ODE system) is currently by far the most computationally intensive section of the program, getting rid of them entirely is an exciting prospect.

Local vs.  global methods

It is perhaps important to note that there is a fundamental difference between the GHYS refinement procedure, which I call a global method, and SLES, which I call a local method. Let the initial noisy orbit be tex2html_wrap_inline3215 , with the superscript referring to the refinement iteration. Assuming the GHYS refinement procedure can be written as a Newton's method, it can be summarized as

  equation495

where g was defined near the beginning of this chapter as the function computing the forward 1-step errors on the entire orbit,gif and g' is the Jacobian of g. Thus the entire orbit is updated using a single (large) equation. SLES, on the other hand, applies the local correction formula of equation 1.9 to each point independently; namely

displaymath3225

One consequence of this difference occurs when the 1-step errors in the GHYS method ``explode'' and cause the iteration to fail. The explosion seems to occur much faster using GHYS than using SLES. I believe this is because in GHYS, all the 1-step errors along the entire trajectory contribute to each other's computation of the correction, so that if a single 1-step error is too big for its linear map to be valid, then the corrections for all the points will be invalid. However, in SLES, there is no such dependence between points at arbitrary distances along the trajectory, so a point with large 1-step error can propagate its instability to neighbors j shadow steps away only after j refinement iterations.

Currently, the only known way to isolate glitches using the GHYS method is the way QT did it, namely to try shadowing for longer and longer times until the orbit explodes, and then to do a binary search for shorter and longer shadows until the shadow step at which the glitch occurs is pinpointed. This is alot of work. Because of the slow way glitches propagate their effects in SLES, it may be possible to isolate glitches simply by their large errors and that of their neighbors. It may even be possible to pinpoint multiple glitches along a single trajectory. For example, points tex2html_wrap_inline3231 and tex2html_wrap_inline3233 would be glitches if we could shadow the trajectory from 0 to tex2html_wrap_inline3235 and from tex2html_wrap_inline3237 to tex2html_wrap_inline3239 and from tex2html_wrap_inline3241 to S, but not shadow any part of the trajectory containing tex2html_wrap_inline3231 or tex2html_wrap_inline3233 . This introduces the concepts of shadowable segments, although it is not clear what use such a concept is.


next up previous
Next: A wider perspective: Why Up: Shadowing Previous: The refinement procedure of

Wayne Hayes
Sun Dec 29 23:43:59 EST 1996

Access count (updated once a day) since 1 Jan 1997: 9676