The back-propagation neural network (BPNN, see Figure 1.9) was developed by Rumelhart et al. [12] as a solution to the problem of training multi-layer perceptrons. The fundamental advances represented by the BPNN were the inclusion of a differentiable transfer function at each node of the network and the use of error back-propagation to modify the internal network weights after each training epoch. >>>>
>>>>
![]() |
The BPNN was chosen as a classifier primarily because of its ability to generate complex decision boundaries in the feature space [13]. There is even work suggesting that a BPNN, under appropriate circumstances, can approximate Bayesian posterior probabilities at its outputs [14]. This is significant because a Bayesian classifier provides the best performance possible (i.e., lowest error rate) for a given distribution of the feature data. As with other non-parametric approaches to pattern classification, it is not possible to predict the performance of a BPNN a priori. Furthermore, there are several parameters of the BPNN that must be chosen, including the number of training samples, the number of hidden nodes, and the learning rate. >>>>
Based on the work of Baum and Haussler [15], it is
possible to place a bound (>>>>m) on the number of training samples
needed to guarantee a particular level of performance on a
set of test samples drawn from the same distribution as the training
data. Specifically, if at least >>>>m samples are used to train a
network with >>>>W weights and >>>>N nodes such that a fraction equal to
>>>>
of them are classified correctly, then one can
be confident that a fraction >>>>
of future (test) samples
from the same distribution will be classified correctly, where
>>>>
>>>>
![]() |
(1.3) |
As a specific example, to guarantee no more than a 10% error in classifying the test data, the number of training samples should be equal to roughly 10 times the number of weights in the network. For a typical network generated below, this represents a requirement for 5000-10000 training samples. It is simply not tractable to generate that many images. Fortunately, this bound does not preclude the possibility of generating a successful classifier using fewer training samples, as many studies have empirically demonstrated. >>>>
The theoretical basis for selecting the number of hidden nodes to use in a single hidden layer network is not well developed. The only general method available to optimize this parameter is to test the network with various numbers of hidden nodes and select the one that performs best. >>>>
>>>>