Alexander Stuy - MS Thesis

THE FLORIDA STATE UNIVERSITY

COLLEGE OF ARTS AND SCIENCES

OPTIMIZATION OF HEBBIAN NEURAL NETWORKS

USING GENETIC ALGORITHMS

ALEXANDER STUY

A Thesis submitted to the

Department of Computer Science

in partial fulfillment of the

requirements for the degree of

Master of Science

Alexander Stuy

The members of the Committee approve the thesis of Alexander Stuy defended on November 23, 1994.

_____________________________

R.C. Lacher

Professor Directing Thesis

_____________________________

Susan I. Hruska

Committee Member

_____________________________

David Kuncicky

Committee Member

_____________________________

Joseph Travis

Committee Member

_____________________________

R.C. Lacher

Chair, Dept. of Computer Science

Acknowledgments

· Dr. Lacher, my major professor, for his guidance and patience.

· Other researchers in the fields of Neural Networks and Genetic Algorithms, for providing a foundation for my research.

Contents

Acknowledgments iii

List of Figures v

Abstract vi

1 Introduction 1

1.1 Overview of Experiment 2

2 Background 3

2.1 Neural Networks 3

2.1.1 Biological Neural Networks 4

2.1.2 Artificial Neural Networks 5

2.1.3 Hebbian Neural Networks 7

2.2 Genetic Algorithms 8

2.2.1 Schema 11

2.2.2 Alphabets 12

2.3 L-systems 13

2.4 Genetic Algorithms and Neural Networks 15

2.4.1 Efficiency of Coding Representations 17

2.4.2 Proof of O(log₂n) Growth of L-system Genotypes

3 The Experiment 23

3.1 Design 25

3.1.1 The environment 25

3.1.2 Animats 26

3.2 Genetics 28

3.2.1 Genetic Operators 29

3.2.2 Parameter Settings 30

3.2.3 Encoding Method One 31

3.2.4 Encoding Method Two 33

3.2.5 Encoding Method Three 34

3.3 Results of Experiment 37

4 Conclusion 43

4.1 Summary 43

4.2 Conclusions 44

4.3 Areas for Further Research 44

References 46

Biographical Sketch 48

List of Figures

1 Biological Neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Artificial Neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Simulated growth of filament from bluegreen alga . . . . . . . . . . 15

4 Artificial Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Encoding of animat input data. . . . . . . . . . . . . . . . . . . . . . . . . 27

6 Decoding of animat neural output. . . . . . . . . . . . . . . . . . . . . . . 28

7 Performance of random movement animat. . . . . . . . . . . . . . . . 39

8 Performance of encoding method one, popsize 20 . . . . . . . . . . 39

9 Performance of encoding method one, popsize 20. . . . . . . . . . 40

10 Performance of encoding method one, popsize 20 . . . . . . . . . . 40

11 Performance of encoding method one, popsize 60. . . . . . . . . . 41

12 Performance of encoding method one, popsize 60 . . . . . . . . . . 41

13 Performance of encoding method one, popsize 60. . . . . . . . . . 42

Abstract

An artificial world of barriers and plains scattered with food is used to test the feasibility of using genetic algorithms to optimize hebbian neural networks to perform on problems without apriori knowledge of the problem domain.

Chapter 1

Introduction

Artificial neural networks and genetic algorithms are both inspired by the success of their biological counterparts. Speed increases in computers in recent years have allowed scientists to combine the software models of these processes. Genetic algorithms have proved capable of optimizing large complex structures, and an artificial neural network is just such a structure. Neural network optimization methods manipulate the number of neurons and the connections between them (network topology) in order to generate a network that learns better than one that is unoptimized. By encoding the network topology into a genotype, genetic algorithms can be used to perform the optimization.

Neural networks can be genetically optimized to perform on traditional neural network problems such as pattern recognition and input/output pattern learning. Even more interesting however is that networks can be evolved to approximate discontinuous functions without any apriori knowledge of the function. In their 1988 paper Stefano Nolfi and Domenico Parisi describe genetically optimized back-propagation neural networks which demonstrate an ability to learn to collect food in an artificial environment [12]. A standard back propagation network is combined with a genetic neural network which supplies the teaching input to the back propagation network. Biological neurons have been shown to employ a form of learning known as hebbian learning [7], though the brain probably uses other learning methods as well. Artificial hebbian networks can be used for feature identifying, data clustering and object recognition [6].

1.1 Overview of Experiment

The first purpose of the experiment is to investigate the feasibility of optimizing hebbian neural networks for existence in an artificial environment. Possible applications of the technique include robot control and system control, however more traditional neural network problems could also be programmed.

Performance of the artificial hebbian networks will be observed by introducing each network into an artificial environment that the network must navigate while searching for food. Network performance is measured as the amount of food found by the network. If the genetic algorithm is successful in optimizing the hebbian neural networks the average performance of the networks will increase through successive generations.

Further the experiment will measure the performance of three different methods for encoding the network topologies. By keeping all factors constant except for the encoding method the experiment will determine whether the encoding method used to encode the neural network topology affects the performance of the genetic algorithm, and if so, which types of encoding methods offer the best performance. The third encoding method is based on a system known to generate patterns resembling patterns produced by growth in biological organisms [10].

Chapter 2

Background

2.1 Neural Networks

Artificial neural networks were inspired by the computing ability of biological neural networks. The human brain performs feats, such as apparently parallel classification of multiple objects, in a fraction of the time that the worlds fastest computer could accomplish the same task. The feat is remarkable as the speed at which biological neurons operate is seven orders of magnitude slower than computer semiconductor gates [4]. The brain can perform such feats because it operates about 10¹¹ neurons in a parallel/distributed architecture. Without employing multiple processors, computer simulations of neural networks must operate the neurons serially, although simulations can simulate parallel update. In the last several years neural networks have been built on chips, with the ability for true parallel update. However the brain is more than just a large collection of homogenous neurons. The brain contains at least 14 distinct types of neurons [4]. The variety and complexity of these neurons and the connectivity between them is the source of the brain's amazing computing power. Many millions of years of evolution have gone into creating this structure.

2.1.1 Biological Neural Networks

A biological neural network (BNN) is composed of specialized cells called neurons. The main components of biological neurons are a cell body, nucleus, axons and dendrites (figure 1). The dendrites receive signals from the nervous system or from the axons of other neurons. The point where axons and dendrites connect is called a synapse. Depending on the input from its dendrites the neuron generates an impulse known as an action potential [4]. This impulse propagates as a wave down the axon, providing a signal to the dendrites of other neurons connected to the axon's branches. Learning in the brain can occur by strengthening or weakening the connections between axons and dendrites. In 1949 Donald Hebb postulated a learning rule, dubbed hebbian learning , as a model of learning for biological neurons [7]. The hebbian model strengthens the weight on a connection whenever the neurons on both sides of the connection fire together.

Figure 1: biological neuron.

2.1.2 Artificial Neural Networks

Like a BNN, an artificial neural network (ANN) consists of one or more interconnected neurons. The neurons is an ANN can be described mathematically :

E : input vector to neuron.

w : weight vector for scaling input vector to neuron.

f : firing function, functions used include threshold, linear and sigmoidal functions.

V : output of neuron.

An artificial neuron with i inputs (dendrites) is fired with the formula :

V = f(k=1..i(E_k * w_k))

One of the first models of an artificial neuron was the McCulloch/Pitts neuron. In the McCulloch/Pitts model the firing function f is a threshold binary function. Each neuron outputs a one if V is above a certain threshold, else the output is zero [6]. Although this seems simple, McCulloch and Pitts proved that a synchronous assembly of these neurons with the proper weight vector w can realize any computable function.

In order for ANN's to learn the values on the weight vector w are adjusted. The value of the adjustment for each w_i is termed ∆w_i. ANN's can be put into two categories : those that use supervised learning to obtain ∆w and those that use unsupervised learning to obtain ∆w. With supervised learning the network is supplied a set of training inputs, along with the correct output for each input. The network uses the input output pairs to adjust w to minimize network error on the training set. This performance carries over to inputs not in the training set by virtue of a neural network's ability to generalize. Unsupervised learning ANN's are only supplied with input patterns. The network clusters or classifies the input data into categories. Therefore there must be meaningful redundancies in the input data. Unsupervised networks can perform functions such as clustering of data, encoding and feature mapping, with the architecture of the network regulating the function the network performs [6].

One of the most widely used neural networks is the backpropagation network. The output of neurons in a backpropagation network is analog rather than digital, and the firing function f is typically a sigmoidal function. The name backpropagation describes the process of propagating the error between expected and actual output backward through the net in order to obtain ∆w for each neuron.

Figure 2 : artificial neuron, V = f(k=1..i(Ek * wk)).

2.1.3 Hebbian Neural Networks

In 1949 Donald Hebb postulated a learning rule based upon results of experiments on biological synapses as a model of learning for biological neurons [7]. The hebbian model, known as hebbian learning, strengthens the weight on a connection whenever the neurons on both sides of the connection fire together. Using the notation introduced at the beginning of 2.2.1 the model can be expressed as :

∆w_i = nVE_i

Hebbian neural networks fall into the unsupervised learning category. The information required to update ∆w is the input to the neuron and the output of the neuron. After sufficient update of the connection weights the output V of each neuron becomes a scalar measure of familiarity of the input E. Output V is directly related to the probability of receiving input E. The symbol n is a constant, termed the learning coefficient. Hebbian networks have been applied as feature detectors, being given data much as would be received from a retina [6]. Layered hebbian networks have been proposed as a model of self-organization in the visual cortex. Variations of the original hebbian update rule include Oja's rule :

∆w_i = nV(E_i - Vw_i),

Oja's rule avoids unlimited growth in the weights on connections which occurs under Hebb's original rule[6].

2.2 Genetic Algorithms

Genetic algorithms (GAs) are computer optimization methods inspired by molecular genetics and Darwin's theory of natural selection. All GAs, both biological and artificial, are search algorithms. In nature natural selection favors organisms with the greatest propagative ability. An organisms propagative ability is a function of the organism's genotype (DNA) and the organism's environment. Biological genetic algorithms search the domain space of possible DNA sequences for ever more fit sequences, in relation to the environment in which the DNA finds itself expressed. Ability to propagate in an environment can be equated to a fitness function for the organisms which inhabit the environment.

GAs operate on genotypes, which are composed of a set of symbols (genes). Most life on earth uses the bases guanine, cytosine, adenine and thymine, a four symbol alphabet commonly abbreviated to G, C, A and T. Artificial GAs commonly use binary encoding methods, although larger alphabets have been used [8].

A simple artificial GA is composed of the operators :

selection, crossover, and mutation.

selection

The initial generation of genotypes is generated randomly. After the initial generation is generated the selection operator chooses from the current population the genotypes that will reproduce to form each succeeding generation. A fitness function is used to measure the fitness of each genotype. Thus fitness is a measure of the performance of the genotype on the function in question. The parameters to the function are stored in an encoded form in the genotype and therefore must be decoded to determine the genotypes performance on the function. The function used is based on the problem that one is attempting to solve with the genetic algorithm. Many different procedures can be used to perform selection, all involve choosing, either stochastically or deterministically, the fittest genotypes from the population. A commonly used selection procedure is roulette wheel selection. After the fitness of all genotypes has been determined a roulette wheel is generated with each genotype receiving a slot sized in proportion to the genotype's fitness in relation to the average fitness of the entire population. Selection into the reproduction genepool is then determined by simple spins of the roulette wheel.

crossover

The crossover operator combines two genotypes chosen for reproduction by splitting each genotype into two or more pieces and combining pieces from both genotypes into a new genotype. Typically not all genotypes selected for reproduction are subject to crossover. A parameter termed the probability of crossover is used to stochastically determine which genotypes are candidates.

mutation

Following selection and crossover genotypes are typically subjected to a mutation operator. The mutation operator scans each gene in the genotype and with a specific probability (probability of mutation) flips the gene to another symbol in the genetic alphabet. Mutation helps to maintain population diversity

Pseudo code for a simple GA is as follows :

Generate the first generation of genotypes randomly

Loop

Measure fitness of each genotype.

Select genotypes for reproduction based on fitness.

Construct next generation of genotypes from selected genotypes using the crossover and mutation operators.

Until stop-condition

The process loops until a genotype performs to an acceptable level of fitness, or until the increase of fitness in successive generations grinds to a halt. The second condition can occur due to a number of causes. The two most common are premature convergence and mutation stall.

premature convergence

After the initial random generation of the population in a GA it is expected that most genotypes will prove to have a very poor fitness, as they are a random attempt at a solution. However often a few of genotypes will have fitness far above the rest of the population, and although the fitness of these genotypes is not the global maximum achievable they will nevertheless dominate the population due to their relative fitness. The GA then becomes trapped at or near this fitness level due to a loss of all other genes from the population (premature convergence).

mutation stall

The mutation operator scans each gene in a genotype and with a preset probability flips the gene to another character in the genetic alphabet. While mutation increases genetic diversity in the population mutation also carries the risk of mutating a beneficial gene, resulting in a decrease in fitness. If the mutation rate is set too high the destructive affect can cause the GA's performance to level off after a sufficient level of complexity has arisen in the genotypes (mutation stall).

2.2.1 Schema

A schema is a method for denotating a subset of strings (genotypes) with similarities at indicated positions (genes) [8]. Schemas are constructed from the genetic alphabet plus an additional wildcard character, usually the '*'. In a GA using a binary alphabet the extended alphabet used for creating schemata would be {0, 1, *}. In a binary GA with strings (genotypes) of length 5 the schema *111* would denote all strings with a 1 in the middle three positions, {01110, 01111, 11110, 11111}.

The fitness of a schema is the average fitness of all the genotypes in the population that contain the schema. Holland showed that the proportion of a particular schema in a population grows at a rate proportional to the ratio of the fitness of the schema to the average fitness of the entire population. This proportional growth happens in parallel for all schemata present in a population [8].

2.2.2 Alphabets

In order to use genetic algorithms to solve a given problem a method for encoding the parameters of the problem (fitness function) into a genotype must be chosen. The smaller the cardinality of the genetic alphabet chosen the more schemata the resulting population will contain, although schemata which cross over parameter encodings are of questionable use. The principle of minimal alphabets states : The user should select the smallest alphabet that permits a natural expression of the problem [8]. Parameters are often coded into a binary representation and the resulting binary strings are combined to form a genotype. However non binary encoding have been successfully employed in genetic algorithms, including genetic algorithms used to optimize neural networks [5].

Genetic algorithms have been applied to a wide variety of problems such as pattern recognition, classifier systems and structural optimization. A.K. Dewdey used genetic algorithms to evolve finite automata, dubbed flibs, which predict binary number sequences [2]. Classifier systems also respond well to genetic manipulation. Evolved classifiers systems have been able to perform object recognition in noisy environments [13]. Simulated robots which use genetically selected classifiers to produce output have been evolved to chase a simulated light source.[3]

2.3 L-systems

Lindenmayer systems or L-systems were introduced by biologist Aristid Lindenmayer in 1968 as a method for describing the morphology of growth in biological systems, particularly plants. L-systems are a type of context free grammar studied in formal language theory. L-systems have been widely used to model plant growth and to generate computer graphics. One of the first attempts of modeling biological systems with L-systems was performed by D. Fritjers and A. Lindenmayer, who modeled the growth and flowering of the plant Aster novae-angliae with a L-system [15]. L-systems can also be used to encode neural network topologies, generating connectivity patterns in the networks similar to connectivity patterns seen in regions of the brain [10].

L-systems consist of a finite alphabet of symbols and a set of rules or productions, each of which maps one symbol or string of symbols from the alphabet to another symbol or string of symbols. The alphabet usually consists of upper and lower case letters, though other alphabets can be used. The set of characters {S, A, B, C, D, a, b, c, d, e, f} is a valid L-system alphabet. A valid production using the above defined alphabet is : S -> BCdA. All L-systems must have a start symbol or an identity production. S is commonly used to denote the start symbol and is used only on the left side of the identity production. Symbols that are never found on the left hand side of a production are termed terminals, as they represent an end to the production process. Non terminals appear on the left hand side of at least one production. Any string of symbols produced by the execution of the productions of a L-system is termed a word of the L-system. The set of all possible words produced by an L-system is termed the language of the L-system.

Figure 3 shows an example L-system as well as a graph representation of the L-system. This particular L-system models the growth of multi-cellular filament found in the blue-green alga Anabaena catenula [14]. The symbols a and b correspond to individual cells, describing the cells propensity to divide, while the subscripts l and r specify the position of the newborn cell after division.

S : a_r

P1 : a_r -> a_lb_r

P2 : a_l --> b_la_r

P3 : b_r -> a_r

P4 : b_l -> a_l

Sequence of words generated :

a_r

a_lb_r

b_la_ra_r

a_la_lb_ra_lb_r

b_la_rb_la_ra_rb_la_ra_r

....

Figure 3: Simulated growth of filament from bluegreen alga Anabaena catenula

2.4 Genetic Algorithms and Neural Networks

The result of combining genetic algorithms with neural networks is termed a genetic neural network. Genetic neural networks consist of four components: genes that specify the network topology, a procedure for constructing the network from the genotype, a fitness function to measure the performance of the network, and genetic operators for creating the new generation of genotypes from the old generation [11]. Genetic algorithms can be used to optimize the number of neurons, the topology of the network, and/or to set the weights on some subset of the networks connections.

Genetic neural networks can learn on two levels. On the individual level each network learns by the weight update rule used (neural network learning). On a population level the networks learn by modifying their topologies over time to decrease the average error in network output of the population. These two processes are known as phenotype and genotype learning, respectively. Genotype learning has been demonstrated to progress at a faster rate when the phenotypes have an ability to learn [9]. This should not be confused with lamarkian evolutionary theory. Only the genotypes are used in the genetic algorithm, what a phenotype learns is not passed on.

In order to use genetic algorithms with neural networks some method of encoding the network topology into a form usable as a genotype must be chosen. Neural networks are often denoted as a binary matrix with size of the number of neurons in the network. A zero in the matrix denotes the lack of a connection between the corresponding neurons, a one denotes a connection. By adding a real component to the matrix the weight information of each connection can also be stored. Another method is to use a a string of integer pairs, each pair representing one connection between two neurons. The pair (1 5) represents a connection from neuron 1 to neuron 5. Here also a third real component can be added to store the weight information of the connection.

Genetic algorithms have demonstrated success both when used to entirely determine network architecture, including weights [5], as well as when combined with learning rules[9].

2.4.1 Efficiency of Coding Methods

Experiments have shown that direct encoding methods do not scale up well, that is the number of generations required to achieve maximum performance increases exponentially as the number of neurons encoded by the genotype increases [10]. Graph L-systems, which generate graphs by rules rather than explicit encodings have been shown to have superior scaling ability, as well as providing a speed up in convergence of the genetic algorithm [10].

The efficiency of an encoding method is often measured in O (Big O) notation. With linear encoding methods such as integer pairs, the genotypes grow at O(m), where m is the number of connections in the ANN. Encoding methods based on connectivity matrixes are O(n²), where n is the number of neurons in the ANN, since the number of cells in the matrix is n². L-system based encoding methods can achieve growth rates in genotypes lower than O(n). The following is an informal proof that L-system encoding methods can achieve genotype growth as low as O(log₂n). The proof uses a simplified version of the L-system based encoding method, (encoding method three), implemented in the experiment described in chapter 3 of this paper.

2.4.2 Proof of O(log₂n) growth of L-system genotypes.

For a given L-system, l, three terms related to L-systems are introduced :

S(l) : the number of symbols in the rules of the L-system.

T(l) : the number of symbols, (terminals), in the expansion of the

L-system productions.

N(l) : the maximum number of neurons in the neural network that

can be encoded by the L-system.

At this point it should be noted that the standard encoding for a neural network of x neurons uses x² symbols, (a standard connectivity matrix of ones and zeros). Thus a L-system that expands to T terminal symbols, where the terminal symbols are 0 and 1, encodes a network with sqrt(T) neurons, meaning that N(l) = sqrt(T(l)).

To prove that L-system encoding methods for neural networks can be O(log₂N(l)) a series of L-systems, labeled LS[i], i = 0.., will be introduced satisfying :

a. S(LS[i + 1]) = S(LS[i]) + C, for some constant C.

b. T(LS[i + 1]) = 4T(LS[i]).

To simplify the notation the following definitions hold :

S_i = S(LS[i])

T_i = T(LS[i])

N_i = N(LS[i])

c = constant

Proof

Given

S_i+1 = S_i + C

T_i+1 = 4T_i

N_i = sqrt(T_i)

then

N_i+1 = sqrt(T_i+1)

= sqrt(4T_i)

= 2N_i

and

S_i = S_i-1 + c

= S_i-2 + S_i-1 + c

= S_i-2 + c + c

= S₀ + i*c

Therefore given a series of L-systems, l, which meets conditions a and b above, growth of S(l) in relation to N(l) is O(Log₂N(l)).

The L-system

The L-systems LS[0..] are defined as follows :

Each L-system LS[i] has i+1 levels, numbered 0..i. A level consists of a set of Symbols and productions. The productions in each level map the symbols of that level to a set of four symbols in the next level, the last level maps its symbols to the terminal symbols (0, 1).

In addition to notation already presented, notation for the L-system in the proof will use the following terms :

t : terminal symbol variable, with possible values of one or zero.

Li : nonterminal symbol of level i, if Li has a subscript it refers to a specific

L-system symbol of level i, else it is a symbol variable with possible

values being the symbols occuring on the left side of the productions in

level i.

LS[0] : L0₁ -> t t t t

LS[1] : L0₁ -> L1 L1 L1 L1

L1₁ -> t t t t, L1₂ -> t t t t, L1₃ -> t t t t, L1₄ -> t t t t

LS[2] : L0₁ -> L1 L1 L1 L1

L1₁ -> L2 L2 L2 L2, L1₂ -> L2 L2 L2 L2,

L1₃ -> L2 L2 L2 L2, L1₄ -> L2 L2 L2 L2

L2₁ -> t t t t, L2₂ -> t t t t, L2₃ -> t t t t, L2₄ -> t t t t

LS[3] : L0₁ -> L1 L1 L1 L1

L1₁ -> L2 L2 L2 L2, L1₂ -> L2 L2 L2 L2,

L1₃ -> L2 L2 L2 L2, L1₄ -> L2 L2 L2 L2

L2₁ -> L3 L3 L3 L3, L2₂ -> L3 L3 L3 L3,

L2₃ -> L3 L3 L3 L3, L2₄ -> L3 L3 L3 L3

L3₁ -> t t t t, L3₂ -> t t t t, L3₃ -> t t t t, L3₄ -> t t t t

LS[i] : L0₁ -> L1 L1 L1 L1

L1₁ -> L2 L2 L2 L2, L1₂ -> L2 L2 L2 L2,

L1₃ -> L2 L2 L2 L2, L1₄ -> L2 L2 L2 L2

Li₁ -> t t t t, Li₂ -> t t t t, Li₃ -> t t t t, Li₄ -> t t t t

Statitistics for the L-systems :

LS[0] : S₀ = 5.

T₀ = 4.

N₀ = 2.

LS[1] : S₁ = 5 + 20.

T₁ = 16.

N₁ = 4.

LS[2] : S₂ = 5 + 20 + 20.

T₂ = 64.

N₂ = 8.

LS[3] : S₃ = 5 + 20 + 20 + 20.

T₃ = 256.

N₃ = 16.

The value of S₀ is five. Each additional level of symbols consists of four rules, 20 symbols. Thus

S_i = S_i + i * 20

20 is a constant so condition a has been met.

Each nonterminal symbol is expanded to exactly four symbols from the following level therefore

T_i = T_i * 4.

condition b has been met.

Chapter 3

The Experiment

To perform the experiment a population of animats is constructed. Each animat consists of a hebbian/genetic ANN, input sensors, and an output register. Based on the contents of the output register the animat has the ability to turn, move and collect food. The animats are introduced, one at a time, into an artificial environment, and allowed to exist for a predetermined life span or until starving (not collecting food for a predetermined amount of time). The environment consists of open terrain with scattered barriers. Food is distributed randomly in the environment before each animat is introduced. Animat performance is measured by the amount of food collected in the animat's life span. The ability to avoid barriers is important to an animat's ability to collect food. Animats that become stuck on a barrier or continuously bump into barriers do not have the opportunity to collect food.

At the end of a generation (all animats have lived in the environment) GAs will be used to produce the next generation of animats from the current generation. The fitness function used for the GA is the performance of the animat in the artificial environment and is measured by the amount of food collected.

The purpose of this experiment is twofold. The first is to determine whether genetic algorithms can optimize hebbian neural network topologies for a given problem. Optimization will be deemed successful if there is significant improvement in network performance from the first generation. The second purpose is to experiment with varied encoding methods to determine whether the method used to encode network topologies into a genotype has an effect on performance. Three encoding methods will be used on the same fitness function, with all parameters, except those related to the encoding, set the same. Thus any difference in performance should be due to the encoding method being used. Let m be the number of connections in the ANN, let n be the number neurons in the ANN. Then the first method uses a O(m) encoding method, the second method uses a O(n²) encoding method. The third, an L-system based encoding method, is O(log₂n). The expected result is that the third method will outperform methods one and two. L-system based encoding methods have been shown to provide superior performance over other encoding methods [10]. The genotypes used in method three are less than 1/50 the size of genotypes in methods one and two, making the search space smaller.

3.1 Design

3.1.1 The Environment

An artificial environment is used to test each animat's neural network performance. The environment consists of a 110 by 90 grid of cells. Each cell contains information about its terrain, food content, and occupant. There are two types of terrain, open and barrier. An animat cannot occupy a barrier cell. The outside ring of cells, five deep, are barrier cells, locking the animats into their environment. There are also five large barriers consisting of from 55 to 225 barrier cells each, (figure 3). Open cells, of which there are 7320, can contain one food unit. Food units are distributed at random throughout the open cells with a seventy percent probability at the beginning of each animat's life span. The expected amount of food distributed is therefore 0.7 * 7320 = 5124. Food is not added to the environment during the life span of an animat.

Time in the environment is measured in "clicks". In each click the state of the environment in front of the animat is encoded and fed into the animat's neural network as input, the output of the net is decoded and the animat's location and orientation are updated, as well as the new state of the environment.

Figure 4: Artificial Environment, with food and animat, as it would appear at the beginning of an animat's life span. The box structures are barriers, the dots are food, the small square with two dots for eyes is the animat.

3.1.2 Animats

Animats consist of an input sensor, an output register, a genotype and a hebbian neural network, the topology of which is determined by the animat's genotype. At the beginning of the animat's life span the animat's genotype is read, the corresponding hebbian network is built. and the animat is placed at random in the environment. Life spans are set at 8000 clicks, although this could be lowered if the animats become sufficiently fit enough to collect all of the food in the environment before the end of their life span. Animats that do not collect any food for a continuous 256 clicks are considered to have starved to death and the animat's life span is terminated. Animats interface with their environment through the input sensor and output register. The input sensor encodes the state of the 13 cells directly in front of the animat, into a binary array, (figure 4). This array is given to the animat's neural network as input, and the resulting output from the neural net is put into the animat's output register. The output register is then decoded to determine the animat's intended movement and orientation, (figure 5). If the movement is legal, i.e. not into a barrier cell, the animat's location is updated. In any event the animat's orientation is updated. An animat collects food by moving onto a cell which contains food, after which the cell is updated to reflect that its food content is zero. An animat's fitness is calculated by the amount of food collected by the animat in its life span, higher food counts meaning higher fitness.

Resulting input: 10110010111000001110110000

Figure 5: Encoding of animat input data

Output Output

for turning for movement

Output for moving strait ahead: 0 0 1 0 1 0

Figure 5: Decoding of animat output data

3.2 Genetics

Since part of the experiment is to test whether the method used to encode the network topology into a genotype affects the performance of the genetic algorithm, genetic operators, (selection, crossover, etc.) are kept as constant as possible across all encoding methods. Therefore any difference in performance should be due to the encoding method used.

The experiment uses three different encoding methods, labeled method one, method two and method three. The first two are explicit encodings of the neural net topology while the third uses genotypes constructed of L-system rules to generate the topologies.

The job of the genetic algorithm is to optimize the topology of the hebbian networks to improve performance on a particular problem. Therefore the animat's genotype determines the topology of its neural network. A genotype will specify the connections between the neurons as well as whether each connection is soft (learning) or hard (fixed weight) connection. If the connection is hard the genotype will also specify the weight on the connection. Genotypes are generated randomly at the beginning of a run. However the initial number of connections between the neurons, and the percentage of hard connections, is controlled.

3.2.1 Genetic operators

Elitism will be considered a genetic operator. Counting elitism, four genetic operators are employed in the experiment, selection, crossover, mutation, and elitism. Crossover, and elitism are performed in the standard manner, however because of the genetic alphabets chosen the mutation operator has to be customized for each encoding method. The selection mechanism is designed to discourage premature conversion and convergence stall late in the run when fitness' tend to vary less. At the end of a generation, animats are ordered by their fitness. The genotype of the fittest animat is copied into the next generation as child one. Then the genotype is mutated and copied to the next generation as child two. The same procedure is performed for the second most fit animat, with the children being three and four. After this the fittest one fourth of the population is mated four times with an animat picked at random from the fittest one third of the population. Half of these children are mutated. Mutating only half of each population allows the mutation probability to be set rather high, 0.01, without causing mutation stall.

3.2.2 Parameter Settings

In all genetic programs the values must be chosen for parameters such as the probability of mutation, the probability of crossover and the size of the population. These parameters can chosen by educated guess or by experimentation. The experiment is very computationally expensive and therefore only limited experimentation was possible. The probability of mutation for all the runs is 0.01 but only half of the new genotypes created are subjected to the mutation operator. The probability of crossover for all runs is 1 and a multipoint crossover operator is used. Each encoding method is run twice, with population sizes 20 and 60. Other parameters important to this experiment include the learning coefficient for the hebbian learning rule, the connection probability, and the hard connection probability. These parameters are kept constant throughout all the runs. The learning coefficient is set at 0.0035, however weights are updated only during the infancy state of an animat. The infancy state consists of the first 400 clicks in the life span of an animat. The connection probability and hard connection probability are used during the initial random generation of the population at the beginning of a run, and during mutation. The connection probability determines the number of connections between neurons, and the hard connection probability determines the number of hard connections. The connection probability and the hard connection probability for all runs is 0.3. During mutation the connection probability and hard connection probability are used to scale mutation. This keeps the mutation itself from driving the number of connections and hard connections between neurons upward or downward.

3.2.3 Encoding Method One

Encoding method one is based upon connection triplets. Each triplet represents a connection between two neurons. A triplet consists of three integers. The first two have range 1..n, where n is the number of neurons in the neural network. The third integer is a member of the set [-10, -5, 0, 5, 10], with 0 meaning a soft connection. If the third integer is non zero it is divided by 10 to obtain the weight on the corresponding connection. A genotype consists of a fixed number of connection triplets, each one representing a connection in the resulting neural network.

Let m be the number of connections in the ANN generated by the genotype. Then the growth in the size of the genotypes is O(m), but fixes an upper limit on the number of connections in the resulting neural networks. The total number of connections triplets used is 1460, which is 20 per neuron. The choice of 20 connections per neuron is an educated guess and open to experimentation.

A formal definition of encoding method one is as follows:

Terminology

n = number of neurons

N = integer value with range 1..n

W = integer value from the set [-10, -5, 0, 5, 10], with 0

meaning a soft connection. If W <> 0 then

the associated connection is hard wired with

weight W/10.

nc = total number of connection triplets in

genotype.

connection triplet = N₁ N₂ W, which is

expressed in the neural net

(phenotype) as neuron N₁

having an input connection

from neuron N_2,.with associated

weight value W.

Genotype definition:

N₁ N₁ W₁

N₂ N₂ W₂

N₃N₃ W₃

. . .

N_nc N_nc W_nc

3.2.4 Encoding Method Two

Encoding method two is based upon the standard neural network connectivity matrix. In addition a second component, the weight value of the connection, is added to each cell in the matrix. The number of neurons used is 73 therefore the resulting matrix has 73 squared or 5329 cells. Each cell contains both a boolean value, zero or one, and one member of the set [-10, -5, 0, 5, 10]. The boolean values specify the connections between the neurons. A zero in the integer value specifies a soft connection. If the integer is non zero it is divided by 10 to obtain the weight on the corresponding connection.

Let n be the number of neurons in the ANN generated by the genotype. Like neural network connectivity matrices, the size of genotypes under method two grow at O(n²). Encoding method two has the advantage however of being able to specify every possible network topology.

A formal definition of Encoding method two is as follows :

Terminology

n = number of neurons.

C = 0 or 1.

W = integer value from the set [-10, -5, 0, 5, 10], with 0

meaning a soft connection. If W <> 0 then

the associated connection is hard wired with

weight W/10.

connection pair = C W, which is expressed

in the neural net (phenotype)

as a neural network

connectivity matrix.

Genotype definition:

C₁ W₁

C₂ W₂

C₃ W₃

. .

C_n2 W_n2

3.2.5 Encoding Method Three

Genotypes in encoding method three are sets of L-system productions. Two sets of productions per genotype are used. The first generates a standard neural network connectivity matrix when expanded. The second generates the weight component. Together they are expanded to form a connectivity/weight matrix similar to the one used in encoding method two. In order to keep the number of neurons constant a constrained L-system model is used. L-symbols are classified by level, with each L-symbol expanding to exactly four L-symbols from the level beneath it. This results in expansions of constant size. Both sets of productions use the same symbols until layer six, which contains the terminal symbols. All symbols in a layer are unique to that layer. Layer zero is implicit and consists of one symbol, the start symbol S. The other layers with the exception of the bottom two layers consist of four L-symbols each. Layer one consists of the set of symbols [A, B, C, D], layer two consists of the set of symbols [E, F, G, H], layer three consists of the set of symbols [I, J, K, L], and layer four consists of the set of symbols [M, N, O, P]. The fifth layer in both sets of rules consists of sixteen L-symbols, the set [a..p]. In the first set of rules the sixteen symbols are hard coded to expand to the sixteen possible binary strings of length four. In the second set the sixth layer consists of the terminal L-symbols -10, -5, 0, 5 and 10. Zero signifies a soft connection, a non zero number is divided by ten to obtain the weight on the corresponding connection.

Genotypes consist of multiple strings of L-symbols from each level. Each string is the right side of an L-system production. The left side of the L-system productions are implicit in the ordering of the L-symbols. The right side of layer six is hard coded in the first set of rules and thus is not included in the genotypes.

The varied nature of the genotypes in encoding method three requires complicated mutation operators. It has, however, properties that may make it more suitable for a genetic search than the previous two encoding methods. The length of the genotypes is less than 1/50th the length of the genotypes in methods one and two. This results in a shortened search space for the GA to search. The size of the search space for a 64 neuron genotype under encoding method one is (2²⁵⁶⁰) * (5¹²⁸⁰). The size of the search space for a 64 neuron genotype under encoding method two is (2⁴⁰⁹⁶)* (5⁴⁰⁹⁶)..The size of the search space for a 64 neuron genotype under encoding method three is (2²⁷²) * (5⁶⁴). Another advantage of encoding method three is that the growth in the length of the genotypes in relation to the number of neurons, n, is O(log₂n).

A formal definition of Encoding method three is as follows :

Terminology

Xi = L-symbol variable of level i.

n = number of neurons.

W = integer value, -10, -5, 0, 5, or 10. with 0

meaning a soft connection. If W <> 0 then

the associated connection is hard wired with

weight W/10.

L-symbols

A,B,C,D = level one L-symbol.

E,F,G,H = level two L-symbol.

I,J,K,L = level three L-symbol.

M,N,O,P = level four L-symbol.

a,b,c,...,p = level five L-symbol.

each L-symbol from level i is expanded to a string of four L-symbols from level i+1, if i < 5.

Each genotype is contains two sets of rules.

For the first set

a is expanded to 0 0 0 0

b is expanded to 0 0 0 1

c is expanded to 0 0 1 0

. .

p is expanded to 1 1 1 1

For second set

a is expanded to W W W W

b is expanded to W W W W

is expanded to W W W W

. .

p is expanded to W W W W

genotype definition:

X1₁ X1₂....X1₄

X2₁ X2₂....X2₁₆

X3₁ X3₂....X3₁₆

X4₁ X4₂....X4₁₆

X5₁ X5₂....X5₁₆

this is expanded to create a connectivity matrix.

X1₁ X1₂....X1₄

X2₁ X2₂....X2₁₆

X3₁ X3₂....X3₁₆

X4₁ X4₂....X4₁₆

X5₁ X5₂....X5₁₆

W₁ W₂......W₆₄

this is expanded to create weight component of connectivity matrix.

3.3 Results

Fitness of the animats (F) is measured by the amount of food units collected in the animats life span. Performance of each run is measured by both the maximum and average amount of food collected by the animats in a generation,

Max(F_i), 1 <= i <= popsize

(F_i) / popsize, 1 <= i <= popsize

as well as the number of generations required to achieve the result. At the beginning of a run performance is very poor, as would be expected. Most animats perform tight circles, or crash into barriers and stick. Behavior rapidly improves in the first generations. The animats learn to move forward and avoid barriers. Actively seeking food is usually the last behavior to be learned. Max(F_i) generally levels out at around 4000 units of food collected, under all three genetic methods. Perfect performance, which would be an animat collecting all the food in its environment, is not seen.

Graph 1 shows the performance of an animat guided by random movement. Random movement animats perform better than early generation neural animats because the random movement animats do not become stuck or get caught in loops, events that occur frequently to early generation neural animats. Given enough generations however the neural animats learn to avoid most traps. By the end of a run the maximum performance for neural animats is approximately twice that of random movement animats, for all encoding methods (Graphs 2 - 7).

While all three encoding methods produced animats with similar performance, the number of generations required for the result using encoding method three was approximately one third that of the number of generations required using encoding methods 1 and 2 (Graphs 2 - 7).

Comparing the runs with population size twenty to those with population size sixty shows that increasing the size of the population speeds up the rate of increase in network performance. Genetic algorithms perform better when given a large gene pool to work with. The tradeoff comes in computing time. However the increase in the size of the population was by a factor of three. In each case increasing the size of the population resulted in a decrease in the number of generations required to achieve maximum performance by a factor of three or more.

Figure 7: Performance of random movement animat.

Figure 8: Performance of encoding method one, with population size 20.

Figure 9: Performance of encoding method two, with population size 20.

Figure 10: Performance of encoding method three, with population size 20.

Figure 11: Performance of encoding method one, with population size 60.

Figure 12: Performance of encoding method two, with population size 60.

Figure 13: Performance of encoding method three, with population size 60.

Chapter 4

Conclusion

4.1 Summary

A method for optimizing hebbian neural networks with genetic algorithms was introduced, as well as three different methods for encoding the network topologies. An artificial environment was used to test network performance. All runs of the experiment resulted in highly increased network performance in the artificial environment after sufficient generations. By combining hebbian ANNs and GAs hebbian ANNs were produced that could successfully navigate an artificial environment while searching for food. The ANNs were never given teaching data, they evolved an instinctive solution to fitness in their environment.

Three methods for encoding hebbian network topologies were presented. The direct encoding methods (methods one and two) showed roughly similar performance. A third L-system based encoding method showed improved performance over the direct encoding methods, achieving similar performance in half the number of generations or less required by the other two encoding methods.

Increasing the size of the population from twenty to sixty, a factor of three, resulted in a decrease in the number of generations required to achieve maximum performance by a factor of three or more. Runs of the experiment with populations of size sixty needed less total computations then runs with populations of size twenty to achieve similar performance. Even larger populations may be more efficient than populations of size sixty, but at some point the increase in efficiency will stop and reverse.

4.2 Conclusions

The experiment demonstrates the feasibility of combining hebbian ANNs and GAs to achieve solutions to problems without supplying training data. Among three encoding methods employed the L-system based encoding proved the most efficient at producing maximum performance with a minimum of computation. The resulting conclusion is that L-system based encoding methods can be superior to direct encoding methods. Increasing the size of the population resulted in reduction in the number of computations necessary to achieve maximum performance. Larger populations can be computationally more efficient than smaller populations.

4.3 Areas for further research

Questions that remain to be answered include : What percentage of learning is due to genotype learning and how much is phenotype learning? What is the most efficient population size? Sixty is more efficient than twenty, but the exact point where increasing population size no longer results in increases in efficiency could be experimentally determined. Another question is : Why does the performance level out in all cases? Genetic algorithms normally top out at a level slightly below the best possible performance. Thus the bottleneck to further learning may be in the learning potential of the ANN. Increasing the complexity of the GA could be studied to see if this increases the level at which the performance levels out. Methods to increase the complexity of the GA could include encoding into the genotype, for each neuron, the firing function of the neuron, the percentage of neural network's life span that the neuron learns, and the weight update rule.

References

[1] J.A. Anderson and E. Rosenfeld, eds. Neurocomputing: Foundations

of Research. Cambridge: MIT Press, 1988.

[2] A.K. Dewdney. Exploring the Field of Genetic Algorithms in a Primordial

Computer Sea Full of Flibs. Science Magazine.

[3] Marco Dorigo and Uwe Schnepf. Genetics-Based Machine Learning and

Behavior-Based Robotics: A New Synthesis. IEEE Transactions on

Systems, Man, and Cybernetics, Vol. 23, No. 1, January/February 1993.

[4] Gerald D. Fischbach. Mind and Brain. Scientific American, pages 24-33,

September 1992

[5] D.B. Fogel, L.J. Fogel, and V.W. Porto. Evolving Neural Networks.

Biological Cybernetics, Sprint 1990., pages 487-493.

[6] John Hertz, Anders Krough, Richard G. Palmer. Introduction to the

Theory of Neural Computation. Addison-Wesley Publishing Company, 1991.

[7] D.O. Hebb. The Organization of Behaviour. New York: Wiley. Partially

reprinted in Anderson and Rosenfeld, 1988.

[8] David E. Goldberg. Genetic Algorithms in Search, Optimization, and

Machine Learning. Addison-Wesley Publishing Company, Inc.

[9] Geoffrey E. Hinton and Steven J. Nowlan. How Learning Can Guide

Evolution. Complex Systems 1, 1987, pages 495-502.

[10] Hiroaki Kitano. Designing Neural Networks Using Genetic Algorithms with

Graph Generation System. Complex Systems 4, 1990, pages 461-476.

[11] H. Muhlenbein and J. Kindermann. The Dynamics of Evolution and

Learning-Towards Genetic Neural Networks.

Connectionism in Perspective, Elsevier Science Publishers B.V

. (North-Holland), 1989, pages 174-197.

[12] S. Norfi and D. Parisi. Auto-Teaching: Networks that Develop their own

Teaching Input.

[13] Elaine J. Pettit and Michael J. Pettit. Analysis of the Performance of a

Genetic Algorithm Based System for Message Classification in Noisy

Environments. Int. J. Man-Machine Studies, 1987, pages 205-220.

[15] Grzegorz Rozenberg. L-systems.

Springer Verlaag Inc., New York 1974.

[14] P. Prusinkiewicz and A. Lindenmayer. The Algorithmic Beauty of Plants.

Springer Verlaag Inc., New York 1990.

Biographical Sketch

Alexander Stuy was born June 17, 1961 in Einthoven, The Netherlands. After moving to America at age four he attended elementary, middle and high school in Tallahassee Florida. After high school he attended Tallahassee Community College, earning an AA degree. Upon graduation from TCC he enrolled at Florida State University, earning a BS and MS degree in Computer Science. Currently he is employed by the Biology Department at FSU as a Systems Analyst.