There are many techniques (and improvements to these methods) for global optimization (i.e., finding the global minimum or maximum of some complex functional). We discuss only two: SA and GA. These two techniques are similar, naturally motivated, general-purpose procedures for optimization. SA and GAs work well on a variety of problems, require little problem specific information, do not need gradient information, and both generate new points in search space probabilistically.
**It should be clear that when we speak of minimization, the case of finding a maxima can also be treated by either taking the reciprocal of function of interest, or taking the negative of function, which ever is most reasonable.
Global minimization problems can be classified as NP-complete, that is, non-deterministic, polynomial time, complete. No method bound taking computer time which depends on a power of N has been found to solve these problems. N is the number of degrees of freedom. There are two main types of strategies:
The Iterative method is like an extremely rapid quench in statistical mechanics such as steepest descent.
Simulated Annealing allows us to utilize the ideas developed in statistical mechanics of liquids (i.e., Metropolis algorithm for obtaining the Boltzmann probability distribution) to more general optimization problems. The general approach to classical systems based on these ideas was Optimization by Simulated Annealing by S. Kirkpatrick, C.D. Gelatt, Jr. and M.P. Vecchi in Science 220, 671-680 (1983), which discussed the thermodynamic connection as well as optimization of electronic circuitry on chips. To reach the optimal (or ground state) think how nature achieves ground state structures.
Hence, for use with other than Thermodynamic Systems we require:
Both Molecular Dynamics and Monte Carlo Methods may be used to perform simulated annealing.
Molecular Dynamics:
Monte Carlo:
The Metropolis algorithm allows us to explore phase space and find the thermodynamic average of the "Energy" for a given temperature. For optimization, we want actually the T=0 K global minimum (ground state) state obtained by slowly annealing the "system". Importantly, the Metropolis Algorithm allows uphill moves along with downhill moves with p(E)= min[1, e-DE/kT], for changes in energy. Clearly, then, if T is very low, the Metropolis Algorithm for the Monte Carlo version is analogous to steepest-descent method (i.e., no uphill moves allowed because they are exponentially damped out) however the underlying trajectory is a random walk which can tunnel through barriers.
Both methods have a duration of simulation to locate the global minimum that scales like eEmax/Emin, where Emax is the maximum value in the set of ALL Minimum energy barriers separating basins on the E-surface, and Emin is the difference between global E and lowest local minimum E. Typically, Emax/Emin >> 1. (See, e.g., Amara, Ma and Straub, DIMACS Series in Discrete Mathematics and Theoretical Computer Science , vol. 23 (1996) and Straub in New Developments in Theoretical Studies of Proteins, ed. by Elber (1996).)
How fast should one anneal? The answer is, of course, dependent upon the optimization problem. However, because Simulated Annealing has its origin from Statistical Mechanics, we know that as one nears a phase transition the specific heat, i.e. CV = dE/dT, shows (possibly sharp) structure as the system cools. When the equivalent of CV in the optimization problem begins to grow, this is a signal (as it is in standard thermodynamic simulations) that we should anneal more slowly so as to access more properly the phase space, for we are approaching a possible change in system trajectory (analogous to a phase transition). There can be critical slowing down and hystersis. Hence, by monitoring the thermodynamically equivalent quantity that is important in the optimization problem (possible something other than CV, such as susceptibility, diffusion, etc.), we may alter the annealing rate to make the search for efficient. This important technique will help in not finding a false "optimal" point. You will note below for the Traveling Salesman Problem that a fixed annealing schedule is maintained (and used only after "thermalizing"), so that no alteration to the annealing schedule is required. This is mainly because this optimization problem is combinatorial and, after exploring annealing rates for such a problem, one can determine a reasonable rate once and for all. The use of analogy to thermodynamic simulation was discussed in the original Science article, ibid.
While discrete parameter sets, such as the Ising model, combinatorial problems, etc., occur frequently, equally important are minimization with respect to Continuous Parameters like required in function fitting. For example, suppose one desires to fit a function to a sum of exponentials or Gaussians, the choice of decay constants are an optimization problem. (linear or quadratic problems have much more efficient methods) Simulated Annealing applied to continuous global optimization problems in this context were addressed by Vanderbilt and Louie, Journal of Comput. Phys. 56, 259 (1984). The main difficulty introduced in going from discrete to continuous application of SA-Monte Carlo is that choosing random steps becomes more subtle. It is helpful to built in some "intelligence" to the search, and Vanderbilt and Louie used excursions from the random-walk to explore the local topology and, thereby, choose the step size.
Problem: A salesman travels to N cities with various 2-D coordinates (xi,yi) with i=1,...,N, returning finally to the city of origin. Each city is to be visited once and only once and the route taken is to be as short as possible.
Question: What is the shortest route?
This belongs to the class of optimization problems able to be addressed efficiently with simulated annealing and scales like eaN, where a is constant.
See Traveling Salesman Applet from Carleton
University for example simulation.
How does annealing rate change result (for better or worse)? We discuss the
solution using Genetic Algorithm below. Can you see an Inversion take place?
Monte Carlo - Simulated Annealing Approach
city(1) = city(N+1)
.
Finally, as discussed in class, this objective function can be chosen to suit problem and code does not have to be altered. For example, as discussed in Numerical Recipes one can make up other energy penalties that modify problem, such as making river crossings cost more. Objective function becomes
where si ± 1 for left or right side of river and an extra penalty of 4K is found if salesman crossed river.
Other Problems You can now imagine how to change this to other problems. But, many problems map to the Lattice Ising model that we have studied previously. For example, the circuit optimization describe in class and in the SCIENCE article. This technique has been used to address a wide range of Engineering, Math, Physical Science, Computer Science, Design and Gaming problems that are fall into the class of optimization problems.
Circuit Optimization (The Ising Model, again!) Consider a N circuits
to be partitioned onto 2 chips. Propagating a signal across chip boundary is
slow. Hence, the placement of elements
on the two chips can be optimized. While putting N circuits on 1 chip eliminates
problem, it leaves no room for additional circuits, or there is not enough room
for the N needed. Define the Connectivity = aij, the number of
signals passing between circuits i and j. Let
i = ±1 , in
dicating which chip
the circuit resides (e.g., left or right).
The Signal is then
Note that i
i is the different number of
circuits on the 2 chips. Hence, a penalty can be added for any imbalance in the
number of circuits by
(
i
i)2.
If z is the average number of circuits connected to a type of chip, then
~ z/2
(equal weight to balance circuits).
With such a penalty, the "cost" function is
which is (again) the Lenz-Ising Model.
GAs were introduced by J.H. Holland (1962) (see, more recently, Adaptation in Natural and Artificial Systems, Univ. of Michigan Press (1975)). A nice introduction to the concepts is contained in D.E. Goldberg's text Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, 1989), although the coding is in PASCAL. David Goldberg, who is a Professor at UIUC in General Engineering, initiated Engineering applications by trying to optimize a gas pipeline distribution system using the GA techniques. There are some tutorials available, such as Shaffer and Small, Analytical Chemistry 69, 236A (1997) or L. Davis, Handbook of Genetic Algorithms, chpt 4., (van Nostrand, 1991).
The main GA terminology is that of analogy with biological reproduction: Chromosones are gene sequences, with Genes being composed of alleles, ai. Symbolically, a sequence of alleles define the gene: G= a1a2 ... aN. Different allele sequences define a different gene. And, a population is defined as a collection of genes (which are a subset of allowed genes, which themselves define the search space). For use below we note that a binary representation of a gene (here we use only 5 bits) can be evaluated as: x= a1 16 + a2 8 + a3 4 + a4 2 + a5 1, where 16= 24, 8= 23, 4= 22, 2= 21, 1= 20. So, for an N-bit Gene, the sum is up to 2N-1. And, any gene with all bits zero has x=0.
Genetic Algorithms are clearly indicated by there similarity to concepts in genetic reproduction. They require a Population Size, Recombination Methods, and Mutation Rate (even various types of mutations are allowed). Importantly, most operation in the GAs are probabilistically controlled, i.e. they are randomized search methods. As Goldberg notes, "randomized search does not necessarily imply directionless search".
GA are different than standard optimization and search procedures by:
N.B. Point-by-point searches can be dangerous as it can locate false peaks in multinodal search spaces. Working with large populations prevent this. Note also that the Population Evolution in GAs requires a defined Process (i.e., Reproduction, Mutation, and Selection) besides, of course, the objective function. The detailed elements of the Evolutionary Process are best determined by the problem at hand, but all fall into the three general categories above.
An example optimization is found below. You may do a web search or browse the cited text for many applications in a multitude of fields. Just to name some areas: biology (of course!); game playing (the hexapawn board, 3x3 chess board with only pawns-- more difficult than you might imagine at first); prisoner's dilemma (should prisoners cooperate, rat out one another, or remain silent, depending on reward for each); pattern recognition; mechanical engineering (bridge component optimization); electrical engineering (circuit optimization); mathematical function optimization; social science (hunter-gatherer); physical science (fitting potential surfaces, as in pseudo-potential calculations, or reaction rate calculations). For your amusement, before GAs the best algorithm for games of conflict was the so-called tit-for-tat algorithm. The tit-for-tat response was that you did whatever your opponent did previously, and this beat most very sophisticated approaches, that is, until GAs.
GA Advantages
GA Disadvantages
This is an example discussed at length in David Goldberg's text Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, 1989). It utilizes all the classic GA techniques mentioned above and shows in detail the STEPS given above. Obviously, it is trivial to solve, we know the answer, and we could write down the entire population space. From what better one to learn?
Let us follow our steps from above.
We must go through REPRODUCTION. Below we generate "initial
population" of switch settings encoded as gene by random coins flips.
Here we have used 4 genes and require 20 coin flips, or 20 bits of information.
The "unassigned integer" value of the Genes is then
calculated. The "objective function" value is calculated
from x. In order to perform probabilistic but weighted search, we
calculate the "selection probability" from the population
(must sum to 1.0). Before performing a randomly seeded update of population, we
may estimate the "expected occurrence" of the "old"
gene in the new gene pool. We form a Roulette Wheel which has regions
weighted according to the "selection probability" column, then
spin the Wheel and we obtain the "actual occurrences" given by
chance. It is important to realize that this weighted probabilistic search
allows a smart but blind search of gene (phase) space. Note that the last two
columns, expected and actual occurrence of the genes, conform to our
expectations: The Best genes get copied more; The Average genes stay
even; and The Worst genes die out! Survival of the fittest.
|
||||||
G-string number, i |
initial G population (random/coin flip) |
unassigned integer value of x= |
object function |
selection probability |
expected occurrence |
actual occurrence from
chance |
1 |
01101 |
13 |
168 |
0.14 |
0.58 |
1 |
2 |
11000 |
24 |
576 |
0.49 |
1.97 |
2 |
3 |
01000 |
8 |
64 |
0.05 |
0.22 |
0 |
4 |
10011 |
19 |
361 |
0.31 |
1.23 |
1 |
|
|
|
|
|
|
|
Sum over column |
|
|
1170 |
1.00 |
4.00 |
4.00 |
Avg of column |
|
|
293 |
0.25 |
1.00 |
1.00 |
Max of column |
|
|
576 |
0.49 |
1.97 |
2.00 |
Next we must go through the MATING PROCESS. Notice below that old Gene 1 occurs once, old Gene 2 occurs twice, old Gene 3 does not occur, and old Gene 4 occurs once. This was produced from the spin of the Roulette Wheel above (chance). To read this table, note that once mated at random the crossover points along the gene (also chosen at random) must be the same for each mate. In the table, the || indicates the Crossover point which is chosen randomly and appears in column 4. As an Crossover example, after randomly choosing a Crossover point along Gene (4th position below), we exchange genetic material:
CROSSOVER EXAMPLE
MATED
OLD GENES NEW GENES
0110||1 --------> 0110||0
|
| SWAP BITS
| After Crossover Point
|
1100||0 --------> 1100||1
Here we have just preformed crossover with
tail of genes, past the crossover point. However, you certainly could define
other types (depending on needs of problem), such as sometime swap left with
left, or right with right, left with right, or right with
left, making sure to conserve the length of gene (at least in this case).
New G-strings |
Mating Pool after reproduction |
Randomly selected Mates |
Crossover sites (randomly selected) |
New G population |
value of x= |
f(x)= x2 |
1 |
0110||1 |
2 |
4 |
01100 |
12 |
144 |
2 |
1100||0 |
1 |
4 |
11001 |
25 |
625 |
3 |
11||000 |
4 |
2 |
11011 |
27 |
729 |
4 |
10||011 |
3 |
2 |
10000 |
16 |
256 |
|
|
|
|
|
|
|
Sum over column |
|
|
|
|
|
1754 |
Avg over column |
|
|
|
|
|
439 |
Max over column |
|
|
|
|
|
729 |
We must now perform MUTATION. With a mutation probability of P=0.001, the first generation does not get mutated because we have only 20 bits, or 0.02 chance of mutation. So, I did not include this. But it certainly could be very effective during the course of the algorithm. Also, be aware that Mutation by Inversion may have been effective. For example, if you have G=10011 and you randomly choose 2nd break to start inversion and you do this for all remaining bits, we get G=10110, which is much closer to optimum than 10011.
Notice that in 1 Generation, average fitness went from 293 to 439, and maximum fitness went from 576 to 729, where we know the exact answer is 961. So, although the random processes help cause this improvement, one can see that it is no fluke! Best Gene, G=11000, string gets copied twice. When combined at random with next highest string (10011) and crossed, a more optimal gene is produced. All because the probabilistic allows blind search, but the weighting adds intelligence to search. Imagine how effective this may become with really large search space and large gene pools.
Notice that if we would have started out with a really lousy starting population this algorithm may have taken a very long time to go anywhere. Consider several G=00000 entries, for example. In such cases, you may see that bit mutation could really help speed things up!
Finally, for this particular simple problem, it is important to realize how the object function and encoding was chosen for this approach to be successful (which must be done for each problem). Because the electrical switch connections are in series, the signal will be always 0 unless switch 1 is ON. Therefore, this requires larger weighting and this is why it is the first allele in the gene. The arbitrary f(x) = x2 was chosen to favor this sequencing of ON from switch 1 to 5. Basically, we know a lot of information about this problem and it can be set up well.
Actually, making f(x)=xN where is N > 10, instead of 2, makes this a much more difficult problem as the function becomes a spike with a small probability of generating randomly the exact genetic solution. Clearly, a multinodal function can be searched very efficiently this way. However, if the exact answer is required, another approach is required. Recently, hybrid schemes have been concocted which use both simulation annealing and genetic algorithms. Also, genetic algorithms have been developed which can produce a Boltzmann distribution in phase space, and spatially as well! See, for example, "Parallel recombinative simulated annealing: a genetic algorithm", Mahfoud and Goldberg, Parallel Computing 21, 1 (1995); or "Notes on Boltzmann Tournament Selection for Genetic Algorithms and Population-Oriented Simulated Annealing", Goldberg, Complex Systems 4, 445 (1990).
For other problems, the trick for GA is the encoding of the population and determining a good objective function. You may, of course, add other penalties to the objective function, such as was done to the traveling salesman problem where a phobea, that of river crossings, were given a penalty. For more details, see discussion on web link for this particular problem, or look in literature.