gen8.py

On the Role of Stop Codons in the Genetic Code

John Cole

gen8.py:

gen8.py is the real nuts and bolts of the simulated annealing program. It utilizes a Metropolis type monte carlo scheme that switches individual codon assignments rather than inversions and cross-overs inspired by genetic-type optimization algorithms. An earlier version (one of 7 earlier versions) did utilize inversions and cross-overs but this scheme was abandoned as it proved to slow in converging than its individual codon-switching counterpart. The irony of this, considering the nature of this project, did not go unnoticed. The script takes nine numerical values and a string as arguments. The numerical values include the integer number of times to shuffle the wild type code before beginning the optimization routine, the five relative weight parameters for the cost function, the starting temperature, the annealing factor, and the number of monte-carlo passes over the entire genetic code at each temperature step. The string argument is a unique name appended to the end of the three output files, so that the code can easily be called several times without fear of overwriting or leading to confusion.

The arrays NRG, M, and NHA contain the standard gas phase heats of formation, mollecular mass, and number of heavy atoms, respectively. Values for the standard enthalpy of formation of each amino acid were taken from E. V. Sagadeev, et. al., 2009.

A random code is generated, the relation table is tabulated, I', and v are generated, and initial values for each of the observables being considered are calculated. These are stored for the time being.

The simulated annealing proceeds inside a while loop, which loops as long as the temperature of the system is greater than 5e-3. This value was chosen to be both reasonably small, while still preventing overflow errors from the exponent. After each iteration of this loop the temperature of the system is multiplied by the annealing factor. Nested inside this while loop are two for loops, one iterating over the number of monte-carlo passes to be made at each temperature step, the other looping over all codons within the genetic code. It is within this loop that the Metropolis algorithm takes place. The switch script is used to suggest a “move”, values are calculated for the various cost measures, and the move is either accepted or rejected in the usual Metropolis manner.

It is worth noting that five numerical parameters appear in the exponent aside from the relative weights w₁ through w₅. These parameters serve a unique role in equalizing the strengths of the various cost measures. It is clear from the preliminary calculations presented earlier that the magnitudes of ΔP, Hf / M, Hf / N, M, and N are wildly different, as are their physical dimentions and the ranges over which they tend to vary. The values appearing in the exponent serve to scale these magnitudes such that the relative weight parameters, w₁ through w₅, can realistically reflect the relative importance of each observable in the cost function. The values were determined by generating 1000 pairs of a random genetic code, and a counterpart having two random codons interchanged. The absolute value of the difference of each considered cost measure for each of these code pairs was calculated, and the average of these values is chosen to be the parameter appearing in the code. This effectively normalizes the average change in each observable under a single MC move to one, so that in the absence of relative weights w₁ through w₅, all observables are equally important in the determination of the acceptance or rejection of a proposed MC step.

In the event that a proposed move is accepted, the saved values for each of the observables are replaced, and the cycle continues. Gen8.py outputs 3 files, the first of which contains the optimized genetic code, another containing data about the run including values of each considered molecular cost measure and the acceptance ratio at each temperature step, and one containing values of side-by-side optimized and wild type stop codon connectedness data, as well as values of D and D². The code also returns these D and D² values.

gen8.py is given below:

The Master code

import pot3

import relationTable2

import randoCode

import switch

import eigEx2

import random

import math

import numpy

import eigEx3

def gen(n,wPol,wHm,wHn,wM,wNHA,Tmax,Tfact,NperBlock,filenameN):

WT=[[0,"stp"],[1,"phe"],[2,"leu"],[3,"iso"],[4,"met"],[5,"val"],[6,"ser"],[7,"pro"],[8,"thr"],[9,"ala"],[10,"tyr"],[11,"his"],[12,"gln"],[13,"asn"],[14,"lys"],[15,"asp"],[16,"glu"],[17,"cys"],[18,"trp"],[19,"arg"],[20,"gly"]]

file=open("GenCode_"+str(filenameN)+".out",'w')

file2=open("NRGdata_"+str(filenameN)+".out",'w')

file3=open("compWT_"+str(filenameN)+".out",'w')

count=0.0

# Polar=numpy.array([5.0,4.9,4.9,5.3,5.6,7.5,6.6,6.6,7.0,5.4,8.4,8.6,10.0,10.1,13.0,12.5,4.8,5.2,9.1,7.9])

NRG=numpy.array([-302.0,-486.8,-520.6,-412.1,-466.1,-567.8,-373.3,-603.7,-415.9,-481.9,-221.8,-611.2,-590.5,-443.4,-786.7,-807.4,-378.1,-215.0,-390.1,-390.0])

M=numpy.array([165.19,131.18,131.18,149.21,117.15,105.09,115.13,119.12,89.09,181.19,155.16,146.15,132.12,146.19,133.11,147.13,121.16,204.23,174.20,75.07])

NHA=numpy.array([12,9,9,9,8,7,8,8,6,13,11,10,9,10,9,10,7,15,12,5])

relNRGm=(NRG[:]/M[:])

relNRGm=relNRGm[:]-min(relNRGm)

relNRGn=(NRG[:]/NHA[:])

relNRGn=relNRGn[:]-min(relNRGn)

rel=relationTable2.relationTable()

code=randoCode.randoCode(n)

v=numpy.zeros([20,1])

v[0][0]=1e-15

I=numpy.eye(20)

for a in range(20):

I[a][a]+=random.random()*1e-15

wt=randoCode.randoCode(0)

wtstp=eigEx3.eigEx(wt,rel)

vect=numpy.copy(eigEx2.eigEx(code,rel,I,v))

oldHm=numpy.copy(numpy.dot(vect,relNRGm))

oldHn=numpy.copy(numpy.dot(vect,relNRGn))

oldNHA=numpy.copy(numpy.dot(vect,NHA))

oldM=numpy.copy(numpy.dot(vect,M))

oldU=numpy.copy(pot3.pot3(code,rel))

while Tmax>5e-3:

for i in range(NperBlock):

for j in range(64):

new=switch.switch(code,j)

vect=eigEx2.eigEx(new,rel,I,v)

newHm=numpy.dot(vect,relNRGm)

newHn=numpy.dot(vect,relNRGn)

newNHA=numpy.dot(vect,NHA)

newM=numpy.dot(vect,M)

newU=pot3.pot3(new,rel)

x=random.random()

if x<math.exp(((oldU-newU)*wPol/7.51+(oldHm-newHm)*wHm/2.390e-3+(oldHn-newHn)*wHn/3.618e-2+(oldNHA-newNHA)*wNHA/4.242e-3+(oldM-newM)*wM/5.648e-2)/Tmax):

code=numpy.copy(new)

oldHm=numpy.copy(newHm)

oldHn=numpy.copy(newHn)

oldNHA=numpy.copy(newNHA)

oldM=numpy.copy(newM)

oldU=numpy.copy(newU)

count+=1.0

file2.write(str(Tmax)+"\t"+str(count/(NperBlock*64.0))+"\t"+str(oldU)+"\t"+str(oldHm)+"\t"+str(oldHn)+"\t"+str(oldNHA)+"\t"+str(oldM)+"\n")

Tmax=Tmax*Tfact

count=0.0

codons=numpy.zeros([16,4])

c=0

for a in range(4):

for b in range(16):

codons[b][a]=code[c]

c+=1

file.write("Final deltaPolarity: "+str(oldU)+"\n"+"Final Hm: "+str(oldHm)+"\n")

file.write("Final Hn: "+str(oldHn)+"\n"+"Final M: "+str(oldM)+"\n"+"Final NHA: "+str(oldNHA)+"\n")

for a in range(16):

file.write(WT[int(codons[a][0])][1]+"\t"+WT[int(codons[a][1])][1]+"\t"+WT[int(codons[a][2])][1]+"\t"+WT[int(codons[a][3])][1]+"\n")

file.close()

file2.close()

stp=eigEx3.eigEx(code,rel)

sum=stp.sum()

stp[:]=numpy.copy(stp[:]/sum)

sum=wtstp.sum()

wtstp[:]=numpy.copy(wtstp[:]/sum)

diff=wtstp-stp

sumdiff=0.0

for i in range(21):

sumdiff+=abs(diff[0][i])

file3.write(str(stp[0][i])+"\t"+str(wtstp[0][i])+"\t"+str(diff[0][i])+"\n")

d2=numpy.dot(diff[0],diff[0])

file3.write("absD = "+str(sumdiff)+"\n"+"D2 = "+str(d2)+"\n")

file3.close()

return (d2,sumdiff)