Global Optimization Techniques:

Simulated Annealing (SA) and Genetic Algorithms (GA)

See Simulation Link Page for Applet Examples,
or Below for Traveling Salesman Applet or GA Worked Example (non-Applet)

There are many techniques (and improvements to these methods) for global optimization (i.e., finding the global minimum or maximum of some complex functional). We discuss only two: SA and GA. These two techniques are similar, naturally motivated, general-purpose procedures for optimization. SA and GAs work well on a variety of problems, require little problem specific information, do not need gradient information, and both generate new points in search space probabilistically.

**It should be clear that when we speak of minimization, the case of finding a maxima can also be treated by either taking the reciprocal of function of interest, or taking the negative of function, which ever is most reasonable.

Global minimization problems can be classified as NP-complete, that is, non-deterministic, polynomial time, complete. No method bound taking computer time which depends on a power of N has been found to solve these problems. N is the number of degrees of freedom. There are two main types of strategies:

The Iterative method is like an extremely rapid quench in statistical mechanics such as steepest descent.


 

Minimization using Simulated Annealing

Simulated Annealing allows us to utilize the ideas developed in statistical mechanics of liquids (i.e., Metropolis algorithm for obtaining the Boltzmann probability distribution) to more general optimization problems. The general approach to classical systems based on these ideas was Optimization by Simulated Annealing by S. Kirkpatrick, C.D. Gelatt, Jr. and M.P. Vecchi in Science 220, 671-680 (1983), which discussed the thermodynamic connection as well as optimization of electronic circuitry on chips. To reach the optimal (or ground state) think how nature achieves ground state structures.

Hence, for use with other than Thermodynamic Systems we require:

  1. A description of possible system configurations.
  2. A generator of random changes in the configurations, i.e. the new solution options presented to system.
  3. An objective function (or, in genetics, a fitness function) which is the analog of Energy whose minimization is our goal.
  4. A control parameter T (analogue of temperature) and an annealing schedule, i.e. rate, dT, at which T is reduced.

Both Molecular Dynamics and Monte Carlo Methods may be used to perform simulated annealing.

Molecular Dynamics:

Monte Carlo:

The Metropolis algorithm allows us to explore phase space and find the thermodynamic average of the "Energy" for a given temperature. For optimization, we want actually the T=0 K global minimum (ground state) state obtained by slowly annealing the "system". Importantly, the Metropolis Algorithm allows uphill moves along with downhill moves with p(E)= min[1, e-DE/kT], for changes in energy. Clearly, then, if T is very low, the Metropolis Algorithm for the Monte Carlo version is analogous to steepest-descent method (i.e., no uphill moves allowed because they are exponentially damped out) however the underlying trajectory is a random walk which can  tunnel through barriers.

Both methods have a duration of simulation to locate the global minimum that scales like eEmax/Emin, where Emax is the maximum value in the set of ALL Minimum energy barriers separating basins on the E-surface, and Emin is the difference between global E and lowest local minimum E. Typically, Emax/Emin >> 1. (See, e.g., Amara, Ma and Straub, DIMACS Series in Discrete Mathematics and Theoretical Computer Science , vol. 23 (1996) and Straub in New Developments in Theoretical Studies of Proteins, ed. by Elber (1996).)

 

Usefulness of the Thermodynamic Analogy

How fast should one anneal? The answer is, of course, dependent upon the optimization problem. However, because Simulated Annealing has its origin from Statistical Mechanics, we know that as one nears a phase transition the specific heat, i.e. CV = dE/dT, shows (possibly sharp) structure as the system cools. When the equivalent of CV in the optimization problem begins to grow, this is a signal (as it is in standard thermodynamic simulations) that we should anneal more slowly so as to access more properly the phase space, for we are approaching a possible change in system trajectory (analogous to a phase transition). There can be critical slowing down and hystersis. Hence, by monitoring the thermodynamically equivalent quantity that is important in the optimization problem (possible something other than CV, such as susceptibility, diffusion, etc.), we may alter the annealing rate to make the search for efficient. This important technique will help in not finding a false "optimal" point. You will note below for the Traveling Salesman Problem that a fixed annealing schedule is maintained (and used only after "thermalizing"), so that no alteration to the annealing schedule is required. This is mainly because this optimization problem is combinatorial and, after exploring annealing rates for such a problem, one can determine a reasonable rate once and for all. The use of analogy to thermodynamic simulation was discussed in the original Science article, ibid.

 

SA Optimization with Respect to Continuous Parameters

While discrete parameter sets, such as the Ising model, combinatorial problems, etc., occur frequently, equally important are minimization with respect to Continuous Parameters like required in function fitting. For example, suppose one desires to fit a function to a sum of exponentials or Gaussians, the choice of decay constants are an optimization problem. (linear or quadratic problems have much more efficient methods) Simulated Annealing applied to continuous global optimization problems in this context were addressed by Vanderbilt and Louie, Journal of Comput. Phys. 56, 259 (1984). The main difficulty introduced in going from discrete to continuous application of SA-Monte Carlo is that choosing random steps becomes more subtle. It is helpful to built in some "intelligence" to the search, and Vanderbilt and Louie used excursions from the random-walk to explore the local topology and, thereby, choose the step size.

 

The Traveling Salesman Problem: A Combinatorial Minimization Problem

Problem: A salesman travels to N cities with various 2-D coordinates (xi,yi) with i=1,...,N, returning finally to the city of origin. Each city is to be visited once and only once and the route taken is to be as short as possible.

Question: What is the shortest route?

This belongs to the class of optimization problems able to be addressed efficiently with simulated annealing and scales like eaN, where a is constant.

See Traveling Salesman Applet from Carleton University for example simulation.
How does annealing rate change result (for better or worse)? We discuss the solution using Genetic Algorithm below. Can you see  an Inversion take place?

Monte Carlo - Simulated Annealing Approach

E=L=i=1,N SQRT( (xi - xj)2 + (yi - yj)2)
such that city(1) = city(N+1) .

Finally, as discussed in class, this objective function can be chosen to suit problem and code does not have to be altered. For example, as discussed in Numerical Recipes one can make up other energy penalties that modify problem, such as making river crossings cost more. Objective function becomes

E=L=i=1,N SQRT( (xi - xj)2 + (yi - yj)2) - K (si - sj)2

where si ± 1 for left or right side of river and an extra penalty of 4K is found if salesman crossed river.

Other Problems You can now imagine how to change this to other problems. But, many problems map to the Lattice Ising model that we have studied previously. For example, the circuit optimization describe in class and in the SCIENCE article. This technique has been used to address a wide range of Engineering, Math, Physical Science, Computer Science, Design and Gaming problems that are fall into the class of optimization problems.

Circuit Optimization (The Ising Model, again!) Consider a N circuits to be partitioned onto 2 chips. Propagating a signal across chip boundary is slow. Hence, the placement of  elements on the two chips can be optimized. While putting N circuits on 1 chip eliminates problem, it leaves no room for additional circuits, or there is not enough room for the N needed. Define the Connectivity = aij, the number of signals passing between circuits i and j. Let i = ±1 , in dicating which chip the circuit resides (e.g., left or right). The Signal is then

S = i > j aij (i - j)2/4.

Note that i i is the different number of circuits on the 2 chips. Hence, a penalty can be added for any imbalance in the number of circuits by (i i)2. If z is the average number of circuits connected to a type of chip, then ~ z/2 (equal weight to balance circuits). With such a penalty, the "cost" function is

E = i > j ( - aij/2) i j ,

which is (again) the Lenz-Ising Model.


 

Optimization using Genetic Algorithms (GAs)

GAs were introduced by J.H. Holland (1962) (see, more recently, Adaptation in Natural and Artificial Systems, Univ. of Michigan Press (1975)). A nice introduction to the concepts is contained in D.E. Goldberg's text Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, 1989), although the coding is in PASCAL. David Goldberg, who is a Professor at UIUC in General Engineering, initiated Engineering applications by trying to optimize a gas pipeline distribution system using the GA techniques. There are some tutorials available, such as Shaffer and Small, Analytical Chemistry 69, 236A (1997) or L. Davis, Handbook of Genetic Algorithms, chpt 4., (van Nostrand, 1991).

The main GA terminology is that of analogy with biological reproduction: Chromosones are gene sequences, with Genes being composed of alleles, ai. Symbolically, a sequence of alleles define the gene: G= a1a2 ... aN. Different allele sequences define a different gene. And, a population is defined as a collection of genes (which are a subset of allowed genes, which themselves define the search space). For use below we note that a binary representation of a gene (here we use only 5 bits) can be evaluated as: x= a1 16 + a2 8 + a3 4 + a4 2 + a5 1, where 16= 24, 8= 23, 4= 22, 2= 21, 1= 20. So, for an N-bit Gene, the sum is up to 2N-1. And, any gene with all bits zero has x=0.

Genetic Algorithms are clearly indicated by there similarity to concepts in genetic reproduction. They require a Population Size, Recombination Methods, and Mutation Rate (even various types of mutations are allowed). Importantly, most operation in the GAs are probabilistically controlled, i.e. they are randomized search methods. As Goldberg notes, "randomized search does not necessarily imply directionless search".

GA are different than standard optimization and search procedures by:

N.B. Point-by-point searches can be dangerous as it can locate false peaks in multinodal search spaces. Working with large populations prevent this. Note also that the Population Evolution in GAs requires a defined Process (i.e., Reproduction, Mutation, and Selection) besides, of course, the objective function. The detailed elements of the Evolutionary Process are best determined by the problem at hand, but all fall into the three general categories above.

 

Applications for GA Optimization.

An example optimization is found below. You may do a web search or browse the cited text for many applications in a multitude of fields. Just to name some areas: biology (of course!); game playing (the hexapawn board, 3x3 chess board with only pawns-- more difficult than you might imagine at first); prisoner's dilemma (should prisoners cooperate, rat out one another, or remain silent, depending on reward for each); pattern recognition; mechanical engineering (bridge component optimization); electrical engineering (circuit optimization); mathematical function optimization; social science (hunter-gatherer); physical science (fitting potential surfaces, as in pseudo-potential calculations, or reaction rate calculations). For your amusement, before GAs the best algorithm for games of conflict was the so-called tit-for-tat algorithm. The tit-for-tat response was that you did whatever your opponent did previously, and this beat most very sophisticated approaches, that is, until GAs.

 

Whether to use a GA Optimization approach is best done by considering the problem at hand.

GA Advantages

GA Disadvantages

 

STEPS to perform a GA Optimization.

  1. Define an objective function (i.e., payoff, energy, or fitness function).
  2. Encode the variables (recall you work with parameter set, not parameters!).
  3. Population is initialized at Random.
  4. Fitness is calculated for each gene in population.
  5. Reproduction produces next generation, or offspring (this copying is subject to weight from objective function so that Darwinian evolution, i.e. survival of the "fittest", is built in). This means reproduction is random but not blindly random!
  6. Crossover, exchange of genetic materials, occurs at randomly selected sites along two randomly selected offspring.
  7. Mutation is performed bit-by-bit according to some preselected mutation probability, Pmu. For example, with Pmu=0.001, and with only 20 bits of information, there is a 0.02 bit mutation expected, i.e. zero; but the mutation grows as new bits are generated.
  8. Finally, the Evolutionary Process is complete, the new population has been generated and steps 4-7 are repeated until acceptable optimization has occurred.

 

Concrete but Simple GA Optimization Example.

This is an example discussed at length in David Goldberg's text Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, 1989). It utilizes all the classic GA techniques mentioned above and shows in detail the STEPS given above. Obviously, it is trivial to solve, we know the answer, and we could write down the entire population space. From what better one to learn?

Optimization Problem: We have electrical switch box with a sequence of 5 switches which may be off or on. We are trying to maximize the signal output, denoted as f(s).

Let us follow our steps from above.

 
We must go through REPRODUCTION.  Below we  generate "initial population" of switch settings encoded as gene by random coins flips.  Here we have used 4 genes and require 20 coin flips, or 20 bits of information. The "unassigned integer" value of the Genes is then calculated.  The "objective function" value is calculated from x. In order to perform probabilistic but weighted search, we calculate the "selection probability" from the population (must sum to 1.0). Before performing a randomly seeded update of population, we may estimate the "expected occurrence" of the "old" gene in the new gene pool.  We form a Roulette  Wheel which has regions weighted according to the "selection probability" column, then spin the Wheel and we obtain the "actual occurrences" given by chance. It is important to realize that this weighted probabilistic search allows a smart but blind search of gene (phase) space. Note that the last two columns, expected and actual occurrence of the genes, conform to our expectations: The Best genes get copied more;  The Average genes stay even; and The Worst genes die out! Survival of the fittest.

 

G-string number, i

initial G population (random/coin flip)

unassigned integer value of  x=

object function
f(x)= x2

selection probability
fi/(Sum fi)

 expected occurrence
fi/f_avg

actual occurrence from chance
(like Roulette)

1

01101

 13

168

0.14

0.58

1

2

11000

24

576

0.49

1.97

2

3

01000

8

64

0.05

0.22

0

4

10011

19

361

0.31

1.23

1

 

 

 

 

 

 

 

Sum over column

 

 

1170

1.00

4.00

4.00

Avg of  column

 

 

293

0.25

1.00

1.00

Max of column

 

 

576

0.49

1.97

2.00

Next we must go through the MATING PROCESS.  Notice below that old Gene 1 occurs once, old Gene 2 occurs twice,  old Gene 3 does not occur, and old Gene 4 occurs once. This was produced from the spin of the Roulette Wheel above (chance).  To read this table, note that once mated at random the crossover points along the gene (also chosen at random) must be the same for each mate. In the table, the || indicates the Crossover point which is chosen randomly and appears in column 4. As an Crossover example, after randomly choosing a Crossover point along Gene (4th position below), we exchange genetic material:

            CROSSOVER EXAMPLE

     MATED
OLD GENES             NEW GENES

0110||1         --------> 0110||0

          |
          | SWAP BITS
          | After Crossover Point
          |

1100||0         --------> 1100||1

Here we have just preformed crossover with tail of genes, past the crossover point. However, you certainly could define other types (depending on needs of problem), such as sometime swap left with left, or right with right, left with right, or right with left, making sure to conserve the length of gene (at least in this case).

New G-strings

Mating Pool after reproduction

Randomly selected Mates

Crossover sites (randomly selected)

New G population

value of x=

f(x)= x2

1

0110||1

2

4

01100

12

144

2

1100||0

1

4

11001

25

625

3

11||000

4

2

11011

27

729

4

10||011

3

2

10000

16

256

 

 

 

 

 

 

 

Sum over column

 

 

 

 

 

1754

Avg over column

 

 

 

 

 

439

Max over column

 

 

 

 

 

729

We must now perform MUTATION.  With a mutation probability of P=0.001, the first generation does not get mutated because we have only 20 bits, or 0.02 chance of mutation. So, I did not include this. But it certainly could be very effective during the course of the algorithm. Also, be aware that Mutation by Inversion may have been effective. For example, if you have G=10011 and you randomly choose 2nd break to start inversion and you do this for all remaining bits, we get G=10110, which is much closer to optimum than 10011.

Notice that in 1 Generation, average fitness went from 293 to 439, and maximum fitness went from 576 to 729, where we know the exact answer is 961. So, although the random processes help cause this improvement, one can see that it is no fluke! Best Gene, G=11000, string gets copied twice. When combined at random with next highest string (10011) and crossed, a more optimal gene is produced. All because the probabilistic allows blind search, but the weighting adds intelligence to search. Imagine how effective this may become with really large search space and large gene pools.

Notice that if we would have started out with a really lousy starting population this algorithm may have taken a very long time to go anywhere. Consider several G=00000 entries, for example. In such cases, you may see that bit mutation could really help speed things up!

Finally, for this particular simple problem, it is important to realize how the object function and encoding was chosen for this approach to be successful (which must be done for each problem). Because the electrical switch connections are in series, the signal will be always 0 unless switch 1 is ON. Therefore, this requires larger weighting and this is why it is the first allele in the gene. The arbitrary f(x) = x2 was chosen to favor this sequencing of ON from switch 1 to 5. Basically, we know a lot of information about this problem and it can be set up well.


 

More Complicated Optimization Problems.

Actually, making f(x)=xN where is N > 10, instead of 2, makes this a much more difficult problem as the function becomes a spike with a small probability of generating randomly the exact genetic solution. Clearly, a multinodal function can be searched very efficiently this way. However, if the exact answer is required, another approach is required. Recently, hybrid schemes have been concocted which use both simulation annealing and genetic algorithms. Also, genetic algorithms have been developed which can produce a Boltzmann distribution in phase space, and spatially as well! See, for example, "Parallel recombinative simulated annealing: a genetic algorithm", Mahfoud and Goldberg, Parallel Computing 21, 1 (1995); or "Notes on Boltzmann Tournament Selection for Genetic Algorithms and Population-Oriented Simulated Annealing", Goldberg, Complex Systems 4, 445 (1990).

For other problems, the trick for GA is the encoding of the population and determining a good objective function. You may, of course, add other penalties to the objective function, such as was done to the traveling salesman problem where a phobea, that of river crossings, were given a penalty. For more details, see discussion on web link for this particular problem, or look in literature.

October 29, 1998 by Duane Johnson.
Updated, November 8, 1999
Updated, November 26, 2001