Condition Numbers

Learning Objectives

Compute the condition number
Quantify the impact of a high condition number

Condition Number Definition

The condition number of a square nonsingular matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is defined by $cond (A) = κ (A) = ‖ A ‖ ‖ A - 1 ‖ <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mi>κ</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></math>$ which is also the condition number associated with solving the linear system $A\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ . A matrix with a large condition number is said to be ill-conditioned.

The condition number can be measured with any $p <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$ -norm, so to be precise we typically specify the norm being used, i.e. $cond 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mtext>cond</mtext><mn>2</mn></msub></math>$ , $cond 1, cond \infty <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mtext>cond</mtext><mn>1</mn></msub><mo>,</mo><msub><mtext>cond</mtext><mrow data-mjx-texclass="ORD"><mi mathvariant="normal">\infty</mi></mrow></msub></math>$ .

If $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is singular, we can define $cond (A) = \infty <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mi mathvariant="normal">\infty</mi></math>$ by convention.

Perturbed Matrix Problem and Error Bound

Let $\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></math>$ be the solution of $A\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ and $^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ be the solution of the perturbed problem $A^\boldsymbolx=\boldsymbolb+Δ\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>+</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ . Let $Δ\boldsymbolx=^\boldsymbolx−\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow><mo>−</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></math>$ be the absolute error in output. Then we have $A\boldsymbolx+AΔ\boldsymbolx=\boldsymbolb+Δ\boldsymbolb,<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>+</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>,</mo></math>$ so $AΔ\boldsymbolx=Δ\boldsymbolb.<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>.</mo></math>$ Now we want to see how the relative error in output $(‖Δ\boldsymbolx‖‖\boldsymbolx‖)<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">(</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mo data-mjx-texclass="CLOSE">)</mo></mrow></math>$ is related to the relative error in input $(‖Δ\boldsymbolb‖‖\boldsymbolb‖)<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">(</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mo data-mjx-texclass="CLOSE">)</mo></mrow></math>$ :

‖Δ\boldsymbolx‖/‖\boldsymbolx‖‖Δ\boldsymbolb‖/‖\boldsymbolb‖=‖Δ\boldsymbolx‖‖\boldsymbolb‖‖\boldsymbolx‖‖Δ\boldsymbolb‖=‖A−1Δ\boldsymbolb‖‖A\boldsymbolx‖‖\boldsymbolx‖‖Δ\boldsymbolb‖≤‖A−1‖‖Δ\boldsymbolb‖‖A‖‖\boldsymbolx‖‖\boldsymbolx‖‖Δ\boldsymbolb‖=‖A−1‖‖A‖=cond(A)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mo>/</mo></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mo>/</mo></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>≤</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo></mtd></mtr></mtable></math>

where we used $‖A\boldsymbolx‖≤‖A‖‖\boldsymbolx‖,∀\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo>≤</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo>,</mo><mi mathvariant="normal">∀</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></math>$

Then

‖Δ\boldsymbolx‖‖\boldsymbolx‖≤cond(A)‖Δ\boldsymbolb‖‖\boldsymbolb‖(1)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mo>≤</mo><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mstyle scriptlevel="0"><mspace width="2em"></mspace></mstyle><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></math>

Therefore, if we know the relative error in input, then we can use the condition number of the system to obtain an upper bound for the relative error of our computed solution (output).

Residual vs Error

The residual vector $\boldsymbolr<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow></math>$ of approximate solution $^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ for the linear system $A\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ is defined as $\boldsymbolr=\boldsymbolb−A^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>−</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ . In the perturbed matrix problem described above, we have

\boldsymbolr=\boldsymbolb−(\boldsymbolb+Δ\boldsymbolb)=−Δ\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>−</mo><mo stretchy="false">(</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo>+</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>

Therefore, equation (1) can also be written as

‖Δ\boldsymbolx‖‖\boldsymbolx‖≤cond(A)‖\boldsymbolr‖‖\boldsymbolb‖<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi mathvariant="normal">Δ</mi><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mo>≤</mo><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac></math>

If we define relative residual as $‖\boldsymbolr‖‖\boldsymbolb‖<math xmlns="http://www.w3.org/1998/Math/MathML"><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac></math>$ , we can see that small relative residual implies small relative error in approximate solution only if $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is well-conditioned ( $cond (A) <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo></math>$ is small).

Gaussian Elimination (with Partial Pivoting) is Guaranteed to Produce a Small Residual

When we use Gaussian elimination with partial pivoting to compute the solution for the linear system $A\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ and obtain an approximate solution $^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ , the residual vector $\boldsymbolr<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow></math>$ satisfies:

‖\boldsymbolr‖‖A‖‖^\boldsymbolx‖≤‖E‖‖A‖≤cϵmach<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>r</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mo>≤</mo><mfrac><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi>E</mi><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow><mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></mrow></mfrac><mo>≤</mo><mi>c</mi><msub><mi>ϵ</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>c</mi><mi>h</mi></mrow></msub></math>

where $E <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi></math>$ is backward error in $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ (which is defined by $(A+E)^\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>+</mo><mi>E</mi><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ ), $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ is a coefficient related to $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ and $ϵ m a c h <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>ϵ</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>c</mi><mi>h</mi></mrow></msub></math>$ is machine epsilon.

Typically $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ is small with partial pivoting, but $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ can be arbitrarily large without pivoting.

Therefore, Gaussian elimination with partial pivoting yields small relative residual regardless of conditioning of the system.

For more details, see Gaussian Elimination & Roundoff Error.

Accuracy Rule of Thumb and Example

Suppose we apply Gaussian elimination with partial pivoting and back substitution to the linear system $A\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ and obtain a computed solution $^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ . If the entries in $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ and $\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ are accurate to $s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ decimal digits, and $cond (A) \approx 10 t <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>\approx</mo><msup><mn>10</mn><mi>t</mi></msup></math>$ , then the elements of the solution vector $^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ will be accurate to about $s - t <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>-</mo><mi>t</mi></math>$ decimal digits.

For a proof of this rule of thumb, please see Fundamentals of Matrix Computations by David S. Watkins.

Example: How many accurate decimal digits in the solution can we expect to obtain if we solve a linear system $A\boldsymbolx=\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow><mo>=</mo><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ where $cond (A) = 1010 <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><msup><mn>10</mn><mrow data-mjx-texclass="ORD"><mn>10</mn></mrow></msup></math>$ using Gaussian elimination with partial pivoting, assuming we are using IEEE double precision and the inputs are accurate to machine precision?

In IEEE double precision, $ϵ m a c h \approx 2.2 \times 10 - 16 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>ϵ</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>c</mi><mi>h</mi></mrow></msub><mo>\approx</mo><mn>2.2</mn><mo>\times</mo><msup><mn>10</mn><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>16</mn></mrow></msup></math>$ , which means the entries in $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ and $\boldsymbolb<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>b</mi></mrow></math>$ are accurate to $| log 10 (2.2 \times 10 - 16) | \approx 16 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><msub><mi>log</mi><mrow data-mjx-texclass="ORD"><mn>10</mn></mrow></msub><mo data-mjx-texclass="NONE"></mo><mo stretchy="false">(</mo><mn>2.2</mn><mo>\times</mo><msup><mn>10</mn><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>16</mn></mrow></msup><mo stretchy="false">)</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><mo>\approx</mo><mn>16</mn></math>$ decimal digits.

Then, using the rule of thumb, we know the entries in $^\boldsymbolx<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mrow><mtext mathcolor="red">\boldsymbol</mtext><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></mrow><mo stretchy="false">^</mo></mover></mrow></math>$ will be accurate to about $16 - 10 = 6 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>16</mn><mo>-</mo><mn>10</mn><mo>=</mo><mn>6</mn></math>$ decimal digits.

Review Questions

What is the definition of a condition number?
What is the condition number of solving $A x = b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ ?
What is the condition number of matrix-vector multiplication?
Calculate the $p <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$ -norm condition number of a matrix for a given $p <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$ .
Do you want a small condition number or a large condition number?
What is the condition number of an orthogonal matrix?
If you have $p <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$ accurate digits in a $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ and $b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ , how many accurate digits do you have in the solution of $A x = b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ if the condition number of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is $κ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>κ</mi></math>$ ?
When solving a linear system $A x = b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ , does a small residual guarantee an accurate result?
Consider solving a linear system $A x = b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ . When does Gaussian elimination with partial pivoting produce a small residual?
How does the condition number of a matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ relate to the condition number of $A - 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup></math>$ ?

ChangeLog

2017-10-27 Erin Carrier ecarrie2@illinois.edu: adds review questions, minor fixes throughout, revised rule of thumb wording
2017-10-27 Yu Meng <yumeng5@illinois.edu: first complete draft
2017-10-17 Luke Olson <lukeo.illinois.edu: outline