Code Optimization

On the Role of Stop Codons in the Genetic Code

John Cole

Is the wild type genetic code optimal?

The absolute goal of this type of project would be to develop cost function that when optimized returned the wild type genetic code—in essence discerning precisely what the code evolved to do. In reality the wild type code may be a minima of a space with many more parameters than the five considered here, or it may not be a minima at all. These are questions beyond the scope of this project.

Instead, we consider the question: what, if any, of our measures, H_f / M, H_f / N, M, and N, are factors that in part give rise to the wild type genetic code? Moreover, is it possible to parameterize a cost function made up of a weighted sum of ΔP, and one of H_f / M, H_f / N, M, and N, such that when minimized, we recover something like the wild type genetic code?

These questions immediately lead us to consider what it means to be like the wild type code? Because we are chiefly concerned with the role of stop codons as a possible vehicle for mollecular cost minimization, a natural measure would be the similarity between the interconnections of stop codons and the various amino acids in some optimized code to those of the wild type code. We can define D and D² as being the sum and the sum of squares, respectively, of the absolute values of the differences between the probability of a transition under SNS from a stop to each amino acid for an optimized code and the wild type code. D² is essentially the square of the distance, in 21 dimentional “amino acid and stop” space, between the wild type code's “stop connectedness” vector—defined here as a vector whose i^th element is the probability of a stop codon becoming the i^th amino acid (or another stop) under SNS—and that of a code calcuated to minimize (or nearly minimize) one of our measures.

D = Σ_i | P(stop → AA_i)_opt – P(stop → AA_i)_wt |

D² = Σ_i | P(stop → AA_i)_opt – P(stop → AA_i)_wt |²

A simulated annealing algorithm was developed to take as arguments relative weights for each of ΔP, H_f / M, H_f / N, M, and N, optimize the cost function, C, based on these weights,

C = (w₁*ΔP) + (w₂*H_f / M) + (w₃*H_f / N) + (w₄*M) + (w₅*N)

and return D, D², and 3 output files containing data regarding the optimization run itself, the values of P(stop → AA_i)_opt and P(stop → AA_i)_wt and the optimized genetic code with expectation values for ΔP, H_f / M, H_f / N, M, and N.