Site hosted by Angelfire.com: Build your free website today!
- NNT-a

CGS-760: Neural Networks (may 2000)

LEARNING Backpropagation Neural networks by error propagation

 

The classical method for training a multiplayer feed-forward neural network is the steepest descent backpropagation algorithm. The basic idea of the backpropagation learning algorithm [1] is the repeated application of the chain rule to compute the influence of each weight in the network with respect to an arbitrary errorfunction E as:

where wij is the weight from neuron j to neuron i, si is the output, and neti is the weighted sum of the inputs of neuron i. Once the partial derivative for each weight is known, the aim of minimizing the errorfunction is achieved by the forming a simple gradient descent as:

Obviously, the choice of the learning rate h, which scales the derivative, has an important effect on the time needed until convergence is reached.

A proposed way to get rid of the above problem is to introduce a momentum method. This technique was popularized by Rumehart et al. [2].

As explained above, backpropagation can be expressed as a gradient descent method for training (or learning) multilayer perceptron weights. Therefore, the rule for changing weights can be presented as follows.

Condition:

For a given problem such: {" xÎX | X = set of training vectors}

there is:

{" dÎD | d = associated desired output vector & D = set of desired outputs associated with the training vectors in X}.

Now let matrix and vector form of the instantaneous error Ep be defined as:

where dk,p is the kth component of the pth desired output zp when the pth training exemplar xp is input to the multilayer perceptron.

Let the total error ET, which is the sum of errors for all the input patterns, shall be defined as follows:

where P is the cardinality of X.

Important note that the total error ET  is a function of both:

1)     the training set of network;

2)     the weights in the network.

To increase the learning rate without leading to oscillation, the backpropagation learning rule may be defined as follows:

where h, which is the learning rate is some small positive number between 0 and 1 (in practice: 0.05 < η < 0.75); a, the momentum factor, which is also a small positive number, and w represents any single weight in the network. Note In the above equation, Dw(t) is the change in the weight computed at time t.

Note:

If a ¹ 0, the training rule will be called the momentum method.

If a = 0, the training rule will be called the instantaneous backpropagation.

If ET is used the, training rule will be called the batch backpropagation method.

 

References

[1] Robbins H. & Monro, S. A stochastic approximation method. Annals of Mathematics and Statistics. Vol. 22, pp 400-407. 1951.

[2] Rumelhart, D. E., McClelland, J. L. & PDP. Parallel distributed processing. MIT Press. 1986.

 

Suggested Readings:

Bishop,C. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK. 1995

 

 

Samana Fatala

School of Engineering

Central Philippine University

sfatala@teacher.com

 

Email: sfatala@teacher.com