Concept Learning, Neural Networks, and the Pursuit of Artificial Intelligence
(written for a Psychology of Learning class)

The Pursuit of AI

            One of several goals utilizing computer modeling of psychological processes is the achievement of so-called artificial intelligence (AI), man-made networks of programs concerted in such as way as to one day achieve consciousness.  The achievement of this end is rapidly developing a broad history of increasingly successful approximations, but all still fall short of the most straightforward measure of apparent consciousness: the Turing test.  [Non-fulfillment of Turing test confirmed as of 2000 (Moor, 2001).]  There are two primary possibilities as to why the many aspiring developers of AI have so far failed to achieve their goal: either modern computers are not yet up to the task or no one has yet hit upon the right approach.  Given the amazing processing power of clusters of networked computers arranged in parallel and the computational equivalency estimates of the human brain given by Kurzweil (1990), it seems most likely that the latter is the case.  With this in mind, it seems the obvious course of action is to attempt to determine a more effective approach to AI, perhaps one more closely modeled on what we know of human learning processes.
            First and foremost of the many concerns in developing a more effective approach to AI is to identify the level or levels of processing which are most essential to the arisal of consciousness.  Consciousness is perhaps must succinctly described as relational self-awareness, relying upon a self-concept and other concepts with which that self-concept is related.  It then seems reasonable to conjecture that the essential level we are looking for is contingent on the ability to learn and combine concepts, and so that is where we shall focus our attention.  In focusing on the conceptualization process, it is clear that we are following the information processing paradigm, and thereby largely neglecting the issue of external inputs and outputs such as visual and auditory senses, as well as motor function.  This seems perfectly appropriate, as the relevant psychological theories and research in concept learning frequently border on this level of abstraction anyway.  (Also, other subfields of AI are working diligently on the areas which are neglected here.)
Concept Learning
            A concept is a set of objects, events, or concepts sharing common features (Lieberman, 2000).  Concepts can be considered fundamental units of thought comprising internal relationships in the mind representing relationships between the many phenomena experienced through living.  Investigation of the nature of a process is a central component of the development of any model, and concept formation is no different.  Thus it is appropriate to take a look at concept formation in its basic form in animals as well as in its more complex form in humans.  
            In animals, concept formation is not immediately evident, but for several decades now, experiments have been going on that provide supporting evidence.  Rudimentary concept formation (post-training recognition of novel pictures with human presence) was demonstrated in pigeons through the use of operant conditioning of screen pecking by Herrnstein and Loveland (1964).  A later experiment by Herrnstein, Loveland, and Cable (1976) showed that pigeons can even learn to distinguish pictures of a particular person from other people in similar pictures.  Cook, Katz, and Cavoto (1997) provided evidence that pigeons can learn to conceptualize sameness versus difference in a variety of visual stimuli.  These are certainly encouraging results, but they represent only one of the most basic varieties of concept learning: visual discrimination.  These are only a few of many experiments in this area, many of which may be more impressive, but these few provide a sufficient indication of the basic animal capacity for concept formation so as to allow procession to more abstract types of concept formation in humans.
            In humans, concept formation begins at a basic level more or less on a par with that seen in early experiments in concept learning with pigeons.  However, no one need conduct experiments to confirm that humans can distinguish pictures containing humans from those that do not, identify an individual person from others, or identify the presence of trees, colors, or other features.  More meaningful is the investigation of the processes involved in such examples of conceptualization.  Prominent theories in this direction include: prototype theory, described by Wittgenstein (1953) and further developed by Rosch (1973); exemplar theory; ACT-R, pioneered by John R. Anderson (1996); and neural network modeling.  Each of these theories is worth some consideration for use, except perhaps ACT-R, because it is already a major modeling project in its own right.  Prototype and exemplar theories provide possible explanations of the process of categorization in concept learning, while neural networks provide a modeling basis for an unknown (but wide) variety of processes.
            Prototype theory describes a predominantly intensional mode of conceptual categorization, meaning that it assumes that categories are defined by properties shared by category elements (embodied in a prototype).  The theory supposes that instances of a concept (which in many cases can be thought of as simply a category) are identified by their similarity to an ideal form of that concept called a prototype.  Thus a prototype is a mental template that is used to identify whether or not some thing is a member of the set of things encompassed by the relevant category/concept.  These prototypes are rather like an average of the known characteristics of members of the conceptual class.  The prototype itself need not exist as an actual instance of the category it represents; instead, it may be being an amalgam of category members.  It seems reasonable to think of prototypicality as a measure of commonality between some thing’s characteristics and the set of a category’s characteristics weighted by their frequency of occurence.  For example, suppose there is someone who has never seen a penguin before, and then, upon seeing one, tries to classify it as being an animal or not an animal.  According to prototype theory, such a penguin-naïve observer thinks of features typical of animals, such as a head, a mouth, and eyes.  The penguin appears to have each of these things and so it is determined to be an animal due to its possession of such characteristics as are typical of other animals (without especial regard to any other particular animals).  
            Exemplar theory describes a predominantly extensional mode of conceptual categorization, meaning that it assumes that categories are defined by the set of their member elements.   Focusing on members of categories, exemplar theory supposes that instances of a category/concept are identified by their similarity to real-world examples, or exemplars, of a category.  Exemplars are stored directly as members of a category and are then compared directly in identifying new category members.  Category membership is concluded if sufficient similarity to existing category members exists.  For example, suppose our penguin-naïve observer first encounters a penguin and tries to classify it as an animal or non-animal.  According to exemplar theory, the penguin is compared to specific known animals such a dog, a duck, and an otter.  Assuming the penguin seems relatively similar to these category exemplars, it will be determined to be an animal as well, and may even be included in future exemplar sets.  

            The prototype and exemplar approaches to conceptual categorization differ in a subtle but important way.  According to the exemplar approach, category membership is decided by comparison to numerous specific category members.  Contrastingly, in the prototype approach, category membership is decided by comparison to a single composite prototype that represents the typical features of the category.  Upon careful analysis, it becomes clear that the comparison process involved in the prototype approach is a compressed version of the exemplar comparison process.  Instead of making multiple comparisons of the characteristics of a potential category member to numerous specific category members, a prototype allows for a single comparison to be made.  Presumably, the prototype is generated by cross-referencing similar category items, producing a template for recognizing new category members.  Thus it seems that the prototyping approach to categorization may represent a stage of concept formation that has progressed to a somewhat higher (more abstract) cognitive level than that represented by the exemplar approach.  Perhaps the prototype approach takes over at the point where the exemplar approach becomes inefficient or when a sufficient variety of category examples are available from which to form an effective prototype.  Regardless of the particulars, the result is a conceptual template that is of tremendous utility.
            These conceptual templates, as representatives of entire categories, can become building blocks of even more abstract (superordinate) concepts.  Continuation of the process of recursively abstracting categories results in a hierarchy of concepts, which, to the best of our knowledge, is exclusive to humans.  The development of this hierarchy coincides with the development of language and underlies its use.  The same might be said of intelligence.  When an extensive range of concepts, including self-referent concepts (e.g. ‘I’), are integrated into this conceptual structure, it may be that consciousness inevitably results.  This line of thinking is one of the most profound ambitions of the field of artificial intelligence, and of this paper’s author in particular.  It is with this in mind that neural networks based (loosely) on the Collins and Loftus (1975) model are considered as potentially integral components in modeling the processes of conceptualization.  

Neural Networks
            Computer models of neural networks in the brain are commonly referred to as (artificial) neural networks and are seeing increasing use in computer modeling of psychological processes.  Early neural networks were single layered and processed signals in a single direction (from input to output), for which quality they are referred to as feedforward.  The most basic type of feedforward neural network, called a perceptron, consists of an input layer (layer 0) fully (unidirectionally) connected to an output layer (layer 1).  (As suggested by Wasserman (1989), such a perceptron is best described as being a single layer neural network, because the input layer (layer 0) performs no data processing.)  Perceptrons were limited to binary inputs as well as outputs until Widrow and Hoff showed how perceptrons with continuous inputs could be trained with the use of a sigmoidal activation function (Widrow, 1959; Widrow & Hoff, 1960).  This enhanced variety of feedforward perceptron, though shown to exhibit the capacity to learn all of the somewhat impressive range of functions it can represent (Rosenblatt, 1962), was really quite limited in its applications.  As Marvin Minsky and Seymour Papert suggested in their 1969 book Perceptrons, since such simple functions as the XOR (exclusive or) logic gate could not be learned by perceptrons, significant advances in neural network design must use one or more hidden layers nestled between the input and output layers.  
            Single layer neural networks consist an input layer of nodes with weighted links to the output layer of nodes, each of which has a summation function (which serves to sum the weighted inputs) and an activation function which determines, based on the node’s activation threshold, the extent to which that sum will result in output.  The weights associated with each processing (non-input) node are scalar values that are modified as the network is trained.  There are two basic types of neural network training processes: supervised and unsupervised.  Supervised training couples input data with desired output data, which is used to compute error values (consisting of the values of the desired output minus the actual output), which are in turn used to modify the weights of the output layer.  The weights of the output nodes are arithmetically adjusted by an arbitrary learning rate constant multiplied by their corresponding error values.  This process is called the delta rule, and it is identical to the Rescorla-Wagner model of associative learning.      Unsupervised training consists of input data only, leaving the neural network to learn based on its own internal algorithms alone.  Supervised learning is typically much more effective.
            Multilayer networks are simply cascades of single layer networks, with each layer receiving its input from the previous layer’s output.  All of the layers in a multilayer network, excepting layer 0 (the input layer) and layer n (the final output layer), are referred to as hidden layers.  With the inclusion of hidden layers, training neural networks to achieve their representational potential becomes a complex issue.  Several training methods have been devised in the interest of seeking more optimal solutions, one of which is the backpropagation algorithm.  With the advent of the backpropagation algorithm (see Werbos, 1994), it became possible to use multilayer neural networks in a wide range of applications for which single layer networks are simply ineffective.              The ineffectiveness of single-layer networks is due to their incapacity to make discriminations between data sets more complex than those separable by a single line (for two-input networks) on a two-dimensional (input) graph.  If n is the number of inputs in a single-layer network, a binary-output function can be represented by the network if the correct outputs of the function can be divided into two regions (in n-dimensional variable space) by a (n-1)-dimensional figure.  For two-dimensional (2D) spaces the dividing figure is a line, for 3D spaces it is a plane, for 4D spaces it is a 3D hyperplane, and so on.  All but the most basic of problems are linearly inseparable and therefore impossible to solve with a single-layer network.  
            Multilayer neural networks are capable of representing far more complex functions than single-layer networks.  Two-layer networks with n inputs can represent functions whose output areas (in their n-dimensional variable space) can be bounded by n lines or line segments, provided that the resulting area is convex, meaning that a line drawn between any two points inside the area lies entirely within the area (thus excluding shapes like crescents).  Three-layer networks can represent any variety of areas by combining the convex shapes of two layers in any number of ways bounded only by the number of neurons used.  So knowing what neural networks are capable of, the process of training them to actually do so (in an efficient and reliable manner) is the next most important matter.  

Training Neural Networks
            Of the two basic neural network training modalities, supervised and unsupervised, only the unsupervised mode is typically considered biologically plausible.  In the case of humans especially, this seems like a ridiculous position to take.  Typical human development is strongly characterized by feedback at a variety of levels in almost every conceivable realm of behavior, internal and external.  It is fairly obvious that even infants are capable of distinguishing (at some level) between comfort and discomfort, which is enough to provide them with feedback about the effectiveness of their basic bio-survival actions.  In learning to understand language, children attempt to mimic the sounds of the words they hear, their success at which they are able to judge by comparing the sounds they’ve made to the mimicked sound.  Children often learn self-control, social skills, and various intellectual behaviors through parental reinforcement.  Students learn whether or not the exceptional length of time and effort spent on researching, thinking about, and writing a paper was worthwhile based on the grades and feedback received from their instructors (as well as from internal satisfaction with their work, hopefully).  All of these are characteristic of the influence of supervised learning.  Learning literature provides additional examples too innumerable to even begin to list.
            One might argue that the given examples are all macro-level phenomena and therefore inapplicable at the level of biological neural networks (such an argument shows a great lack of imagination, but it deserves an attempted elaboration nonetheless).  Following the example of speech-mimicry, suppose a sound pattern (word) is transmitted through ear and cochlea, transformed into a form usable by the speech-interpretation areas of the brain, which somehow communicates its interpretation to the speech production areas, resulting in speech (or an approximation thereof).  At first, these speech approximations will be rather poor, and probably not sound quite right even to the child.  Recognizing that some degree of error has been made (unsatisfied mimicry-intent), the entire speech processing/production pathway may then make adjustments at each level of processing until the result is satisfactory.  One plausible description of this process is exemplified by the backpropagation algorithm.
            The backpropagation algorithm for training multilayer neural networks is an extension of the basic supervised training algorithm for single-layer networks, the details of which can be found in Rumelhart, Hinton, and Williams (1986).  The primary novelty of the backpropagation algorithm is that the error values calculated and applied to the output layer are then multiplied by the derivative of the current layer’s activation function and propagated backwards as the error values to be used for the adjustment of all the hidden layers in the network.  Because of the use of the derivative of the activation function in the backpropagation algorithm, it is essential to use a continuously differentiable activation function such as the sigmoid function, which is commonly used as the activation function due to its property of squashing of exceptionally high and low inputs to a manageable mid-range.  While this method allows for neural networks to be trained to represent a wide variety of tasks, the convergence of the network to a state of stable representation of sufficient accuracy can take much longer than with more advanced algorithms.
            A variety of more advanced algorithms have been developed as improvements upon or alternatives to the backpropagation algorithm, one of which is Parker’s (1987) second-order backpropagation algorithm, which uses second derivatives to better approximate the optimal weight adjustments to be made during training.   Perhaps even more useful was the discovery by Almeida (1987) and Pineda (1987) of an implementation of the backpropagation algorithm for recurrent networks (networks whose outputs feedback to the inputs), the result being a significant improvement in efficiency over feedforward-only types.  Recurrent networks in general may be, in at least some cases, a more realistic representation of the brain’s functioning than feedforward networks; this is particularly true in the case of working memory where learning by rehearsal requires a feedback loop.  There exist even more advanced neural networking paradigms incorporating recurrence, which, though typically more computationally intensive, may be well worth the extra processing time for AI applications.  One of the most promising approaches, especially appropriate for concept learning in particular, is ART (Adaptive Resonance Theory) (Grossberg, 1987;Carpenter & Grossberg, 1987).

ART (Adaptive Resonance Theory)
            Neural networks designed under the architecture of ART are formidable tools for constructing functional models of learning processes (such as conceptualization) that are essential in the development of mature human intelligence.  ART-based neural networks are considerably more complex than the more basic varieties of neural networks that use such algorithms as backpropagation, yet remain highly flexible and applicable to the modeling of virtually any cognitive function.  One of the foremost features of ART-based networks that make them ideal for modeling cognitive processes is a blend of stability and plasticity that allows them to learn to recognize new patterns while never losing the capacity to make previously learned pattern classifications.  While this limits the number of pattern classifications that can be made by an ART network of fixed size, it also guarantees the reliability of the classifications that can be made, a worthwhile tradeoff.  
            All varieties of ART-based neural networks share a set of basic features, though many modifications and enhancements have been developed over the years.  Initially though, it is most important to focus on the core features of ART as described by Wasserman (1989).  The basic ART system is an unsupervised learning model and typically consists of comparison and recognition fields (one each) of neurons, a vigilance parameter, and a reset module.  (The setting of the vigilance parameter has considerable influence on the functioning of the system: higher vigilance will result in highly accurate memories of exacting detail, while lower vigilance will result in more abstract categorizations of greater generalizability.)  The comparison field takes an input vector (a one-dimensional array of values) and transfers it to the recognition field, where its best match is found to be the single neuron whose set of weights (weight vector) most closely matches the input vector.  Each recognition field neuron has an output that feeds a negative signal (proportional to that neuron’s quality of match to the input vector) to the inputs of each other recognition field neuron and inhibits their output accordingly.  Thus the recognition field exhibits lateral inhibition, allowing each neuron in it to represent a category to which input vectors are classified.  After the input vector is classified, the reset module compares the strength of the recognition match to the vigilance parameter.  If the vigilance threshold is met, training commences.  Otherwise, if the match level does not meet the vigilance parameter, the firing recognition neuron is inhibited until a new input vector is applied; training commences only upon completion of a search procedure.  In the search procedure, recognition neurons are disabled one by one by the reset function until the vigilance parameter is satisfied by a recognition match.  If no committed recognition neuron’s match meets the vigilance threshold, then an uncommitted neuron is committed and adjusted towards matching the input vector.  
            There are two basic methods of training ART-based neural networks: slow and fast.  In the slow learning method, the degree of training of the recognition neuron’s weights towards the input vector is calculated to continuous values with differential equations and is thus dependent on the length of time the input vector is presented.  With fast learning, algebraic equations are used to calculate degree of weight adjustments to be made, and binary values are used.  While fast learning is effective and efficient for a variety of tasks, the slow learning method is more faithful to biological precedents as well as being infinitely more appropriate for the real-time operation necessary to realize the goal of para-human consciousness in AI.
            Expanding upon the basic ART architecture are a variety of systems, each with their own special properties (summarized here).  ART-1 is the simplest variety of ART networks, accepting only binary inputs.  ART-2 extends network capabilities to support continuous inputs, a more flexible configuration.  ART-2A is a streamlined form of ART-2 with a drastically accelerated runtime resulting, the tradeoff in results being only rarely suboptimal compared to the full ART-2 implementation (Carpenter, Grossberg, & Rosen, 1991a).  ART-3 (Carpenter & Grossberg, 1990) builds on ART-2 by simulating rudimentary neurotransmitter regulation of synaptic activity by incorporating simulated sodium (Na+) and calcium (Ca2+) ion concentrations into the system’s equations, which results in a more physiologically realistic means of partially inhibiting categories that trigger mismatch resets.  Fuzzy ART (Carpenter, Grossberg, & Rosen, 1991b) implements fuzzy logic into ART’s pattern recognition, thus enhancing generalizability.  An optional (and very useful) feature of fuzzy ART is complement coding, a means of incorporating the absence of features into pattern classifications, which goes a long way towards preventing inefficient and unnecessary category proliferation.  ARTMAP (Carpenter, Grossberg, & Reynolds, 1991), also known as Predictive ART, combines two slightly modified ART-1 or ART-2 units into a supervised learning structure where the first unit takes the input data and the second unit takes the correct output data, then used to make the minimum possible adjustment of the vigilance parameter in the first unit in order to make the correct classification.  Fuzzy ARTMAP (Carpenter et al., 1992) is merely ARTMAP using fuzzy ART units, resulting in a corresponding increase in efficacy.
            All of the above ART systems are capable of demonstrating impressive capacity for and accuracy of learning, and each may have its place in a complete artificial intelligence project, though some systems may be more or less appropriate for various functions.  For instance, ARTMAP and Fuzzy ARTMAP are particularly effective in situations when supervised learning is feasible; a Fuzzy ARTMAP network has been shown to satisfactorily learn a test set in 1/4000 as many training sessions as a backpropagation network (Carpenter, Grossberg, & Rosen, 1991b) (though it could be that Fuzzy ARTMAP took longer to complete each training session).  Continuous-valued Fuzzy ART may be the most appropriate choice for unsupervised applications because it is an intuitively better approximation of biological learning (i.e., it operates in real time).  Fuzzy ART has the additional advantage of using complement coding to ensure the efficient use of recognition layer neurons, which is synonymous with making more efficient use of computer memory.

Hierarchy, Attention, and Context in Conceptual Development
            Since we are developing a functional model of concept formation’s role in the development of AI (the possibility of which is rather a hypothesis in itself), it is not necessary to base the design exclusively on empirically verified theories.  Certainly in many cases it will be beneficial to make use of empirical evidence characterizing the nature of biological intelligence, but by no means will all of the elements of this AI design derive from biological models.  In fact, in attempting to provide an existential proof of the AI hypothesis (merely that it can be done), we can afford to use hypothetical arguments and even wild speculation so long as it seems to be useful in the overall design.  This in mind, the roles of hierarchy, attention, and context are discussed subsequently.
            It is fairly evident at this point that the organization of conceptual structure in consciousness is hierarchical in nature.  Therefore, it is obvious that the structure of models of concept learning should be hierarchical as well.  It has been suggested by Dean Keith Simonton (1999) that creativity may be inversely proportional to the degree of hierarchical organization in a person’s conceptual structures, and extending this idea to AI, it may be desirable to initially opt for a lesser degree of creativity by emphasizing hierarchy.  It would not do to have a pioneering example of successful AI to appear to be overly creative, as such creativity might lead to unpredictable behavior that could be regarded as dangerous and aversive.  Of course, this may be an overly optimistic concern, but nonetheless, the need for hierarchy in concept formation is clear.
            Attention is a very slippery subject because of the tremendous role it plays in so many aspects (perhaps all aspects?) of consciousness, but hopefully by focusing on its role in the process of conceptualization a useful analysis and synthesis can be explicated.  In concept formation, a set of features is identified as being common to one or more perceptual objects (be they visual, aural, kinesthetic, conceptual, or otherwise) that are then combined into a conceptual category.  At first, when this category consists of only a few members (or even just one) it may resemble the description of exemplar theory, but it is most likely that as more members are subsumed into the category it becomes more like a prototype.  As members of a conceptual category are recognized, it becomes apparent which features most consistently and saliently recur throughout the category.  These recurring features then become focal points of attention in the context of recognizing the category to which they apply.  
            Multiple levels of categorization occur in the process of concept formation, and conceptual dimensionalization is sometimes active in the formation of higher-level concepts.  In conceptual dimensionalization, as already-formed basic concepts recur in recognizable patterns, they will themselves become the basic elements for the formation of new, more abstract, concepts.  In some cases, the conceptual components of these more abstract concepts are usefully analyzed by virtue of the quality or frequency of their occurrence, that is, their dimensions.  Examples of this dimensionality are clearest in such categories as polygons (dimension: number of sides) and colors (dimensions: light frequency and intensity).  Hierarchical Fuzzy ART networks should be capable of such classifications without any special configuration because of the multiple levels of synaptic weights involved.  Thus, in a sense, a hierarchical neural network approach to AI may be able to pay attention (and do so selectively at a variety of conceptual levels).  In fact, Grossberg (2003) specifically refers to this capacity of ART-based neural networks as attentional focus.
            Having established that a hierarchical system constructed with ART-based neural networks could effectively pay selective attention to relevant components of concepts in the process of higher-level concept formation, it becomes necessary to establish the capability to do so in a useful manner.  No information is readily available on the ability of such complex systems to provide the information they are clearly capable of providing in a way that it is relevant to the context in which it is needed.  Therefore, it seems reasonable to hypothesize a system architecture in which the conveyance of the contextual relevance of information would be possible.

Architecture and System Function
            Many important issues in the design of a concept-learning based AI system do not arise until larger portions of total system function are analyzed, and so it becomes prudent to develop significant portions of system architecture that will produce the desired functionality.  Inevitably, shortcomings will become evident, and an essential part of the design process is to identify and solve the problems underlying those shortcomings.  Suppose then, that the hierarchies of ART-based neural networks are arranged pyramidally, typical hierarchical form.  Considering the level of necessary connectivity and addressability for conceptual structures to be very high, these (essentially two-dimensional) pyramids of conceptual hierarchies then should be arranged in a form that will allow both the origin of input and level of abstraction of concept nodes’ output to be easily mapped.  A torus in which pyramid tops are radially arrayed towards the center and their bases spreading away from the center seems like a feasible structure.  In this arrangement, the pyramids of neural networks would exhibit radial symmetry, with no more than two pyramids fully in any given plane.  The torus shape has several advantages: concept nodes are addressable by radial position (general area of input origin) and distance from the center (level of abstraction); the addition of additional modules at any orientation to the torus is quite feasible; the torus is arbitrarily scalable to allow any necessary level of penetrability and exposure of surface area; the central region is left open for unknown functions that may be hierarchically superior to the primary conceptual structure; the torus readily lends itself to a variety of interesting mathematical analyses.  It should be noted that because the system is to be implemented in software its physical geometry is irrelevant to its function, and is presented only as a visualizable metaphor for its actual computational functioning.  
            Supposing then that the AI system’s hierarchies of neural networks are arranged in the form of a torus, with sensory inputs typically converging upon the outer surface of the torus, and the level of conceptual abstraction throughout the torus increasing towards its center, it does not appear that contextual information would be maintained with a strictly linear traversal of the conceptual structure from margin to center.  However, if a second set of output lines were to branch off from each conceptual node of the hierarchy and feed to all output processing areas that might need that information, essential contextual information might be preserved.  For examples, such connections would logically extend to language processing modules for the production of speech and written language.  
            The open central region of the torus would be an ideal place for the keystone of the AI system: the self-concept.  This is highly speculative, but perhaps a specially interlaced group of ART networks could be inserted into the central region after stable high-level concepts are developed.  Desirable traits could be chosen from among the high-level concepts available, and connected, with positive weights, to this inner node structure.  Undesirable traits could be connected with negative weights, and irrelevant traits might remain disconnected.  It is then possible that this unification of concepts would become a self-concept for the AI.  It is also quite possible that the desired effect would not be achieved at all, but regardless, the results are almost certain to be quite informative.

            It may still be premature to attempt a complete AI system, but it is undoubtedly useful to continually evaluate progress towards that objective in the process of researching and synthesizing the architecture of each component process.   Undoubtedly a great deal of work remains to be done in designing and testing all aspects of the functioning of such a system, but the problems involved are gradually becoming comprehensible.  For many years the prospect of designing artificial intelligence that might become (or at least seem to be) conscious was little more than a dream, but it is a dream that is becoming closer and closer to reality.  The use of neural networks and the emphasis of concept learning seems to be essential elements of the realization of this dream; in combination they begin to suggest key pieces of the puzzle of  arising consciousness.  


 Almeida, L.B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. Proceedings of the First International Conference on Neural Networks, 2, 609-618.

Anderson, John R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51, 355-356.

Carpenter, G.A. & Grossberg, S. (1987). ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics 26(23), 4919-4930.

Carpenter, G.A. & Grossberg, S. (1990). ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks, 3, 129-152.

Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., & Rosen, D.B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog     multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713.

Carpenter, G.A., Grossberg, S., & Reynolds, J.H. (1991). ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network.     Neural Networks, 4, 565-588.

Carpenter, G.A., Grossberg, S., & Rosen, D.B. (1991a). ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural Networks, 4, 493-504.

Carpenter, G.A., Grossberg, S., & Rosen, D.B. (1991b). Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks, 4,     759-771.

Carpenter, G.A. & Grossberg, S. (2003). Adaptive Resonance Theory. In M.A. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks, Second Edition (pp. 87-90). Cambridge, MA: MIT Press.

Collins, A.M. & Loftus, E.F. (1975). A spreading-activation theory of semantic memory. Psychological Review, 14, 407-428.

Cook, Robert G., Katz, Jeffrey S., & Cavoto, Brian R. (1997).  Pigeon same-different concept learning with multiple stimulus classes. Journal of Experimental Psychology: Animal Behavior Processes, 23, 417-433.

Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11, 23-63.

Herrnstein, R. J., & Loveland, D. H. (1964). Complex visual concept in the pigeon. Science, 146, 549-551.

Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 2, 285-302.

Kurzweil, R. (1990). The age of intelligent machines. Cambridge, MA: MIT Press.

Lieberman, David A. (2000). Learning: Behavior and Cognition (3rd ed). Belmont, CA: Wadsworth/Thomson Learning.

Minsky, Marvin, & Papert, Seymour. (1969). Perceptrons. Cambridge, MA: MIT Press.

Parker, D.B. (1987). Optimal algorithms for adaptive networks: second order back propagation, second order
direct propagation, and second order Hebbian learning. Proceedings of the IEEE International Conference on Neural Networks, 2, 593-600.

Pineda, F. J. (1987). Generalization of backpropagation to recurrent neural networks. Physical     Review Letters, 59(19), 2229-2232.

Rosenblatt, Frank. (1962). Principles of neurodynamics: perceptrons and the theory of brain     mechanisms. New York: Spartan Books.

Rosch, Eleanor H. (1973). Natural categories. Cognitive Psychology, 4, 328-350.

Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In Parallel distributed processing: explorations in the microstructure of cognition (pp. 318-362).
Cambridge, MA: MIT Press. 

Simonton, D.K. (1999). Origins of genius: Darwinian perspectives on creativity. New York: Oxford University Press.

Wasserman, Philip D. (1989). Neural computing: theory and practice. New York: Van Nostrand Reinhold.

Werbos, Paul J. (1994). The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. New York: Wiley.

Wittgenstein, Ludwig. (1953). Philosophical investigations (G. E. M. Anscombe, trans.). Oxford: Blackwell.