Statistical Errors

A & T pgs. 191-198

Today I am going to review a basic idea that is central to the course: how to estimate the errors of a simulation.

First let me define a simulation. A simulation has some "state" variables, which we denote by S. In classical mechanics these are the positions and velocities of the particles. In an Ising model they are the spins of the particles. We start the simulation at some initial state S₀. We have some process which modifies the state to produce a new state so that we can write S_n+1=T(S_n). We will call the iteration index "time". It is in quotes because it may not refer to real time but a fictitious time or pseudo-time, sometimes called Monte Carlo time.

The iteration function T(S) can either be deterministic (like Newton's equations of motion called Molecular Dynamics) or have a random element to it (called Monte Carlo). In the random case, we say that the new state is sampled from the distribution T(S). What we discuss in this lecture and in much of the course, is the situation where the "dynamics" is ergodic, which we define roughly to mean that after a certain time one loses memory of the initial state, S₀, except possibly for certain conserved quantities like energy, momentum etc. This implies there is a correlation or renewal time. For time differences much greater than the correlation time, all properties of the system not explictly conserved are unpredictable as if they were randomly sampled from some distribution. Ergodicity is often easy to prove for the random transition but usually difficult for the deterministic simulation. We will discuss ergodicity further in the next lecture.

Assuming the system is ergodic one can discuss the state of the system as a probability distribution: F_n(S). (it is the distribution of outcomes that you get if you take a distribution of initial states.) At large time this will approach a unique distribution, the equilibrium state F^*(S). In classical statistical mechanics this is the Boltzmann distribution.

The main goal of most simulations is the calculation of a set of properties in equilibrium. The most common example is the internal energy E = <e(S)> where the average means over F^*(S). In a simulation we sample e(S_n) for n<T where T is the total length of the run. The curve of e versus T we call the trace. Calculating an estimate of E is easy: just take an average over the run. The only "trick" is to be sure to throw away the part of the trace that too much influenced by the initial conditions. This is called the transient, equilibration portion or warm-up. (Question: if you have a big spike at the beginning of your trace and you neglect to throw it out of the average, how long does it take before the influence of the spike is less than the statistical error? That is, how does the estimated mean depend on the propertiess of the warm up portion?)

The next order of business is to figure how accurate is the estimate of the exact value: namely the estimated error in the estimated mean commonly called the error bar. Simulation results without error bars are only suggestive. Without error bars one has no idea of its significance. All respectable papers in the literature should contain estimates of any properties that are important to the conclusion of the paper. All homework exercises must include errors estimates. Many mistakes are made in estimating errors. You should fully understand both the equations and get an intuitive feel for how to estimate errors. (“eye ball” method) Although we have prepared a code to automatically estimate errors, you should know the formulas and what they mean. On the exam you will have to do it without DataSpork. Often one hears the complaint that one cannot estimate errors because the quantity being measure is a very complicated function of the state. However, it is always possible to get a rough idea of the error bar which is often all that is needed, by simply rerunning the entire code a few times with different initial conditions and looking at the spread of results. This is called the "eyeball" method. While there are many ways for errors to be underestimated, the eyeball method will not seriously overestimate the errors.

The fundamental theorem on which error estimates is based is the central limit theorem due to Gauss. It says that the average of any statistical processes will be "normally" or have a Gaussian distribution. Further the error bar will equal square root of the variance of the distribution divided by the number of uncorrelated steps. Later we will discuss in detail the conditions under which this result holds. The correlation time of sequence is the number of steps it takes for a fluctuation to disappear.

There is a terminology which we use to describe the process of estimating the <A> and estimating the error of our estimate. You need to learn what all these terms mean so you can communicate with your fellow scientists about statistical distributions. We have developed a JAVA Analysis Tool, DataSpork, which does the computations for you discussed here. In the first homework you have to learn how to use it and what the results mean.

The symbols and formulas are given on the pdf formula page.

Trace of A(t): The plot of A(t) versus t. You can learn a lot by looking at the trace. What are the trends you observe? What is the correlation between nearby values of t? Are there any spikes or outliers? These could represent a real physical effect or could be the result of a bug and should be understood.
Equilibration time. How long does it take to settle to a plateau if it does at all? Usually the equilibration period is thrown out so as to estimate the mean more accurately. It is very important that the plateau region be much longer than the equilibration period to be sure that A(t) has really settled down. In run such as shown here, there is literally no way to say if it has converged.
Histogram of values of A ( F(A) ). The distribution of values of A. Is it Gaussian? Is it symmetric about the mode? Does it have tails? Is it bimodal? The first two moments are very important:

Mean of A (a). The estimate of the mean value is simply the average over the equilibrated data.
Variance of A ( v ). The mean squared deviation of A(t) about the estimated mean.

An estimate is our best guess of a value. Normally we do not know the exact value of a quantity, thus we must estimate it from the finite sample. Thus we have an estimate of the mean, an estimate of the variance, etc. Sometimes there are different formulas for taking the same trajectory and estimating a property; these are called different estimators. For the mean, the equally weighted sum is the best (has the least error).
Autocorrelation of A (C(t)). The correlation of neighboring fluctuations. Note the following properties of C: C(0)=1, C(t)=C(-t) and at large time C(t) approaches zero. C(t) is needed to estimate the error of our estimate of the mean. If data points are highly correlated our estimate of the mean is not efficient.
Correlation time (kappa). The integrated autocorrelation function. The area under the autocorrelation curve measures how bad the correlation is. Danger you have to cut off the integral at large times since the noise never quits. In principle, each property can have a different correlation time. Often however they are related.
Effective number of points (N _effective). The total number of points (minus the equilibration period) divided by the correlation time. Example: if you have 1000 data points but the correlation time is 5 steps, then you only have 200 independent data points. N_effective is what goes is the estimator for the error of the mean.
The (estimated) error of the (estimated) mean value (sigma). This is what we want to quote as an error bar! The standard deviation is the error of a Gaussian (normal) distribution. Why do we care about Gaussian distributions? According to the central limit theorem, if the variance is finite and the correlation time is finite, the estimated mean of any simulation will be normally distributed. Hence we can use the rule that 68% of the time the estimate mean will be within one standard deviation of the true mean, 90% of the time within 2 s.d. and so forth.
Blocking analysis This is another equivalent procedure of correcting for autocorrelation. Sections of the run are added together to make blocks; large enough blocks so that successive ones are uncorrelated. Then the effective number of points equals the number of blocks since the correlation time has been reduced to one in the new time variable. (see AT pgs 192-193). The analyzer does this using a recursive procedure. Adjacent data points are averaged (blocked) together to get a new data point; the variance of the new data points is computed along with the final error. The procedure is repeated until there are only 20 data points left. If the curve of the computed error versus the blocking level shows a plateau, that is a good estimate of the "true error."
Cumulative average. The running average from the beginning of the simulation (or after the equilibration step). This is a bad way of looking at the data since the data has to be pathological to show up problems in the cumulative average. A slowly decaying trend will be almost completely masked in the cumulative average. Note how much better this looks! Plotting the cumulative or running average is a common way of hiding problems in the data. The analyzer does not show this function.
Efficiency. The statistical errors always eventually decrease as 1/sqrt(run time). However the coefficient out front is partially under our control. This is what we call the efficiency of the algorithm. [efficiency = 1/(CPU time * error ²)] We will discuss this later. The efficiency depends on how fast your computer is, how well you program the algorithm, etc. Alternatively, one measures the "stepwise efficiency" =1/(number_steps * error ²)] however with this one cannot compare the efficiencies of two different algorithms that use different amount of time/step.

See also the detailed page of pdf formulas.

Calendar

Statistical Errors

©1998-2001 David Ceperley and D.D. Johnson