Skip to content

Matrix Calculation

Direct Methods for Solving Linear Systems

We focus on solving linear system

\[ A\vec{x} = \vec{b} \]

Gasussion Elimination

Reduce A into an upper-triangular matrix, and then solve for the unknowns by a backward-substitution process

Pivoting Stratages

This part is to reduce the error caused by rounding/Truncation error.

We can show that the pivoting element is of great significance.

(also known for not changing the columns)

Determine the smallest \(p\geq k\) (in the same column of \(a^{(k)}_{kk}\))such that

\[ |a_{ok}^{(k)}| = \max_{k\leq i \leq n}{|a_{ik}^{(k)}|} \]

and perform \((E_k) \leftrightarrow (E_p)\).

For row \(i\), let

\[ s_i = \max_{1\leq j\leq n}{|a_{ij}|} \]

(if \(\exists i, s.t. s_i=0\), then the system has no unique root. So we assume \(\forall i, s_i>0\))

For each procedure of executing \(E_k \leftarrow E_k - m_{k,i}E_i\) for \(k=i+1, \cdots, n\), where \(m_{k, i} = a_{ki}/{a_{ii}}\). let

\[ p = \arg \max_{i\leq k \leq n}{\frac{|a_{ki}|}{s_k}} \]

perform \((E_i)\leftrightarrow(E_p)\)

Incorporate the interchange of both rows and columns.

Time Cost

As we all know the time expense for Gaussion elimination is

\[ O(n^3) \]

LU Matrix Factorization

The idea is encouraged by Gaussion Elimination. See that a matrix \(A\) can be transformed into an upper-trianglar matrix \(U\) by primary row operations:

\[ \begin{equation} U = M_{n-1}\cdots M_2M_1A \label{eq: LU} \end{equation} \]

where \(M_k (k=1,2,\cdots n-1)\) denotes a series of row operations. There are two perspetives.

\(M_k (k=1, \cdots, n-1)\) can be interpreted that the \(k+1\) row has to make its column \(1\) to \(k\) to be \(0\). That is,

\[ E_{k+1} \leftarrow E_{k+1} - \sum_{j=1}^{k} m_{k+1, j}E_j \]

\(M_k (k=1,\cdots, n-1)\) can be defined in another way as

\[ E_j \leftarrow E_j - \sum\limits_{k=j}^{n}m_{j,k}E_k \quad \text{for } j=k+1, \cdots n \]

which is also a lower-triangular matrix. To be proved by readers.

And we can see \(M_k\) formed through the above two interpretations are the same.

If we denote \(L_k = M_k^{-1}\), then apply \(L = L_1L_2\cdots L_{n-1}\) left to both sides of the equation \(\ref{eq: LU}\), then

\[ LU = L_1L_2\cdots L_{n-1} \cdot M_{n-1}\cdots M_2M_1A = A \]

We know that matrix \(L_k\) and \(M_k\) are lower-triangular matrix(explaned by definition, to be proved by readers), so the product of matrix L is alao a lower-triangular matrix.

So with the triangular matrix, it can be much quicker to solve the solution. See that

\[ \begin{align*} A\vec{x} &= \vec{b} \\ LU\vec{x} &= \vec{b} \end{align*} \]

First solve \(L \vec{y} = \vec{b}\), then solve \(U \vec{x} = \vec{y}\).

time cost

Eliminate \(0.5n^2\) elements, it needs time \(O(0.5n^3)\).

Solving \(y\) and \(x\), it needs time \(O(2n^2)\).

Iterative Techniques in Matrix Algebra

This section, we introduce the iterative thoughts from Fixed-Point Iteration(不动点法) to solve a linear system.

We aim to find a iterative equation like equation \(\pmb{x}^{k} = f(\pmb{x}^{k-1})\). To be more specific, a linear iterative equation like

\[ \pmb{x}^{k} = T \pmb{x}^{k-1} + \pmb{c} \]

and its corresponding convergent relation is

\[ \pmb{x}= T \pmb{x} + \pmb{c} \]

Preliminary knowledge: Norm of Vectors and Matrixes

Definition of Norm of vectors

A vector norm on \(\mathbb{R}^n\) is a function, denoted as \(\Vert \cdot \Vert\), mapping from \(\mathbb{R}^n\) into \(\mathbb{R}\) with the following properties for all \(x, y \in\mathbb{R}^n\) and \(\alpha \in \mathbb{C}\).

  • 正性 | positive
\[ \Vert \vec{x} \Vert\geq 0 \]
  • 定性 | definite
\[ \Vert \vec{x}\Vert = 0 \Leftrightarrow \vec{x}=\vec{0} \]
  • 齐性 | homogeneous
\[ \Vert \alpha\vec{x} \Vert = |\alpha|\Vert \vec{x} \Vert \]
  • 三角不等式 | triangle inequality
\[ \Vert \vec{x}+\vec{y} \Vert \leq \Vert \vec{x} \Vert+\Vert \vec{y} \Vert \]

We usually use \(p\) norm

\[ \Vert \vec{x} \Vert_p = \left(\sum_{i=1}^n|x_i|^p\right)^{1/p} \]

with its common forms:

\[ \Vert \vec{x} \Vert_1 = \sum_{i=1}^{n}|x_i|, \quad \Vert \vec{x} \Vert_2 = \sqrt{\sum_{i=1}^{n}|x_i|^2}, \quad \Vert \vec{x} \Vert_\infty = \max_{1\leq i\leq n}|x_i| \]

Definition of Norm of Matrixes

A matrix norm on the set of all matrices \(R \in \mathbb{R}^{n\times n}\) is a real-valued function, denoted as \(\Vert \cdot \Vert\), defined on this set, satisfying for all \(A, B \in \mathbb{R}^{n\times n}\) and all \(\alpha \in \mathbb{C}\):

  • 正性 | positive
\[ \Vert \mathbfit{A} \Vert\geq 0 \]
  • 定性 | definite
\[ \Vert \mathbfit{A} \Vert = 0 \Leftrightarrow \mathbfit{A}=\mathbfit{0} \]
  • 齐性 | homogeneous
\[ \Vert \alpha\mathbfit{A} \Vert = |\alpha|\Vert \mathbfit{A} \Vert \]
  • 三角不等式 | triangle inequality
\[ \Vert \mathbfit{A}+\mathbfit{B} \Vert \leq \Vert \mathbfit{A} \Vert+\Vert \mathbfit{B} \Vert \]
  • 一致性 | consistent
\[ \Vert \mathbfit{A}\mathbfit{B} \Vert\leq \Vert \mathbfit{A} \Vert \cdot \Vert \mathbfit{B} \Vert \]

Usually we use Natural Norm:

\[ \Vert \mathbfit{A} \Vert = \max_{\vec{x}\neq 0}\frac{\Vert \mathbfit{A}\vec{x} \Vert_p}{\Vert \vec{x} \Vert_p} = \max_{\Vert \vec{x} \Vert_p =1}\Vert \mathbfit{A}\vec{x} \Vert \]

with its common forms:

\[ \begin{align*} \Vert \mathbfit{A} \Vert_1 &= \max_{1\leq i\leq n}\sum_{j=1}^{n}|a_{ij}| \quad\text{the maximum of row summation}\\ \Vert \mathbfit{A} \Vert_2 &= \sqrt{\max \rho(A^T A)} \quad\text{the maximum of spectrum radius}\\ \Vert \mathbfit{A} \Vert_\infty &= \max_{1\leq j\leq n}\sum_{i=1}^{n}|a_{ij}| \quad\text{the maximum of column summation}\\ \end{align*} \]

for \(p=2\) norm of matrix, we have special expression for special matrix:

Sepecial Expreesion of Norm 2 of matrix

If matrix \(A\) is symetrical, then

\[ \Vert \mathbfit{A} \Vert_2 = \sqrt{\max \rho(A)}. \]

If matrix \(A\) is orthogonal(only rotate), then

\[ \Vert \mathbfit{A} \Vert_2 = 1. \]

Jacobi's Method

Here we denote \(L\) and \(U\) to be the lower-triangular and upper-triangular matrix of matrix \(A\) without its diagonal elements respectively. (different from \(LU\) factorization!) And then we denote \(D\) to be the diagonal elements of the matrix of \(A\). That is,

\[ A = D - L -U \]

Then

\[ \begin{align*} A\pmb{x} &= \pmb{b} \\ (D-L-U)\pmb{x} &= \pmb{b} \\ D\pmb{x} &= (L+U)\pmb{x} + \pmb{b} \\ \pmb{x} &= D^{-1}(L+U)\pmb{x} + D^{-1}\pmb{b} \end{align*} \]

which gives matrix form of the Jacobi iterative technique

\[ \pmb{x}^{k} = D^{-1}(L+U)\pmb{x}^{k-1} + D^{-1}\pmb{b} \]

The Gauss-Seidel Method

This method sees that a little slowness in Jacobi's Method. That is, for each itearion period(\(\pmb{x}^{k} \leftarrow \pmb{x}^{k-1}\)), it makes use of the generated \(\pmb{x}^{k}_{i}\) in the \(i\)th row of \(\pmb{x}^{k}\) and use it to update the coressponding element in \(\pmb{x}^{k-1}\).

In matrix form, we have

\[ D\pmb{x}^{k} = U\pmb{x}^{k-1} + L\pmb{x}^{k}+ \pmb{b} \]

(to be proved by readers)

then

\[ \pmb{x}^{k} = (D-L)^{-1}U\pmb{x}^{k-1} + (D-L)^{-1}\pmb{b} \]

Approximating Eigenvalues

The Power Method

Assume that \(A\) has eigenvalues \(|\lambda_1|>|\lambda_2|\geq |\lambda_3|\geq \cdots \geq |\lambda_n|\geq 0\), we can use the following method to make the largest \(lambda_1\) stand out.

幂法 | The Power Method

Initialize randomly \(\vec{x}\), which can be represented by \(n\) linearly irrelevant eigenvectors \(\vec{v}_1, \vec{v}_2, \cdots, \vec{v}_n\), with parameters \(\beta_1, \beta_2, \cdots, \beta_n\), such that

\[ \vec{x} = \sum_{i=1}^{n}\beta_i\vec{v_i} \]

multiply both sides by \(A\), according to \(A\vec{v_i}=\lambda_i \vec{v_i}\), we get

\[ A\vec{x} = \sum_{i=1}^{n}\beta_i \lambda_i \vec{v_i} \]

repeat this process for \(n\) times we get

\[ A^n\vec{x} = \sum_{i=1}^{n}\beta_i \lambda_i^n \vec{v_i} = \lambda_1^n\sum_{i=1}^{n}\beta_i (\frac{\lambda_i}{\lambda_1})^n \vec{v_i}\rightarrow \lambda_1^n \beta_1 \vec{v_1} \quad (n\rightarrow \infty) \]

That is, we can neglect eigenvalues that are smaller than \(\lambda_1\) through multiplying \(A\) and "extract" the biggest one.

To avoid divengence caused by \(\lambda_1>0\), we need to normalize \(\vec{x}^{k} = A\vec{x}^{k-1}\) each step after multiplying. Usually we choose \(\Vert\ \Vert_\infty\).

To get the \(\lambda_1\) out, we can use

\[ \frac{\Vert\vec{x}^{k}\Vert}{\Vert\vec{x}^{k-1}\Vert} \approx \lambda_1 \quad (n\rightarrow \infty) \]

to get \(\lambda_1\).

The next question is, naively, how about the speed of converging? Luckily, the question is easy to answer:

rely on ratio \(\left|\frac{\lambda_2}{\lambda_1}\right|\).

Inverse Power Method

This is a trick from the Power method. It comes from a question: what if we want to calculate the smallest eigenvalue of a matrix \(A\)?

The answer is, by taking use of metrix inverse.

反幂法 | Inverse Power Method

Matrix \(A\) has eigenvalues \(|\lambda_1| < |\lambda_2| \leq \cdots \leq |\lambda_n|\), then matrix \(A^{-1}\) has eigenvalues

\[ \left|\frac{1}{\lambda_1}\right| > \left|\frac{1}{\lambda_2}\right| \geq \cdots \geq \left|\frac{1}{\lambda_n}\right| \]

Then use the power method to get \(\left|\frac{1}{\lambda_1}\right|\) out.

Actually, the above method is more often being used in situation where we have known an eigenvalue \(\lambda_1\) (not neecssarily the largest or smallest) of \(A\) is close to a constant \(q\). That is, we can formulate matrix

\[ (A-qI) \]

which has eigenvalues

\[ |\lambda_1 - q| < |\lambda_2 - q| \leq \cdots \leq \left|\lambda_n - q\right| \]

Then we can use the above method to get \(\lambda_1\) out.