# Volume 14, Number 3

PATTERN RECOGNITION LETTERS

March 1993

the exact relationships of the Al search procedures to the dynamic programming and branch and bound methods I had encountered when teaching Operations Research (OR) courses as a part-time visiting professor at the Wharton School of Business of the University of Pennsylvania (Kanal and Kumar (1981), Kumar and Kanal (1983), Kanal and Kumar (1988)). Although my IEEE-IT Trans. survey paper noted that a 1974 conference on Biologically Motivated Automata indicated a revival of interest in neural models and adaptive networks, and I was aware that Albus, Amari, Fukushima, and Kohonen worked in this area, -by and large I was not following that literature.

It was in the above context that a young man came by in the spring of 1975 to introduce himself as a new assistant professor in the Government and Politics department at the University. He wanted to talk about his Ph.D. dissertation which he had recently finished at Harvard. He said quite enthusiastically that it related to pattern recognition, learning machines, and intelligent systems. With the best of intentions I told him to lend me a copy and he lent me a report titled "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences" (Werbos (1974)). Soon the report got buried in the usual flood of paper that unfortunately still remains very much part of my life. I have often wondered in recent years if the title had been different, e.g., if it had mentioned something about a procedure for training multilayer neural networks, or if my desk had been less cluttered, would I have paid more attention to this report or was I so tuned away from artificial neural networks that it would have made no difference? Of course that young man was Paul Werbos and the method he wanted me to read about was the error back-propagation (EBP) method for training multilayer feedforward neural networks. I was not alone in neglecting his work. It was not until some years later, that the independent, brilliant work of David Rumelhart and his colleagues at the University of California, San Diego, on Parallel Distributed Processing (Rumelhart and McClelland (1986)) made EBP famous, and it was some time after that that Paul Werbos gained recognition for his pioneering work on EBP. Perhaps in science catchy names and titles play a positive role. At least they do get our attention.

Several deficiencies of EBP have been noted by researchers. Improvements continue to be proposed and alternative iterative and non-iterative procedures continue to be cited as being equivalent or better. However, EBP was one of the main reasons for the resurgence of artificial neural networks, at least in the United States. My guess is that currently it is probably the most widely used general procedure for training neural networks for pattern classification. A basic problem of statistical pattern recognition, viz, the dimensionality sample size problem, also arises in artificial neural systems. In the design of multilayer feedforward networks one question is how many hidden units to use. An experimental approach to this question is to make the activation of the hidden units visible to the designer and see which units don't have much activation during the classification process and suppress them. For determining the number of parameters, i.e., the order of a statistical model, Akaike (1974) introduced an information criterion now referred to as the Akaike Information Criterion (AIC). A generalization of the AIC, called the Network Information Criterion (NIC), for determining the number of hidden units for an artificial neural network, is presented in a report by Murata et al. (1992). The NIC is for model selection among a sequence of nested, hierarchical models, where a lower dimensional model is a submodel of the next higher dimensional model in the sequence. These studies represent some of the current efforts to develop a theoretical methodology for determining the number of hidden units in a multilayer back-propagation network.

The attention attracted by the energy function minimization approach introduced by J.J. Hopfield in 1981-82, for developing self-organizing, associative memory networks was possibly a primary reason for the resurgence of significant scientific activity in artificial neural networks in the early years of the last decade. The application of energy function minimization in the design of neural networks for combinatorially hard optimization problems, and for pattern classification and motion vision has been very fruitful and interesting for me and my colleagues. Hopfield and Tank (1985) introduced a technique for embedding optimization problems, such as the traveling salesman problem (TSP), in mean-field thermodynamic neural networks. One of the shortcomings of their proposed technique is that each discrete optimization problem must be reduced to the minimization of a 0-1 Hamiltonian. A sufficient but not necessary condition for network stability is that the network connection weights be symmetric. The technique yields fully connected networks of functionally homogeneous "visible" units with loworder symmetric connections. Theoretically, all NP-hard optimization problems can be reduced to the TSP and consequently, 0-1 Hamiltonian minimization problems. However, direct reductions have been found for only structurally simple problems. In Hellstrom and Kanal (1990), we showed that embedding the NP-hard knapsack optimization problem in neural networks required the use of functionally heterogeneous units. In Hellstrom and Kanal (1992a), a knapsack packing neural network with both low-order and conjunctive asymmetric synapses is derived from a non-Hamiltonian energy function. Extensive simulations showed that for the knapsack optimization problem, solution quality was consistently better than approximate solutions given by a simple

244