Numerical Analysis¶
Basic ideas for Solving Equation¶
To find the solution of an equation, we hope to have an iteration method which takes good advantage of Computer resources like
for \(k = 1, 2, \cdots n\). We use \(\pmb{x}\) instead of \(x\) because the above iteration method also applies to solving linear system.
Hopefully, if the above equation converges, that is, for \(k \rightarrow \infty\), \(\pmb{x}^{k-1}\rightarrow \pmb{x}^*\), \(\pmb{x}^{k}\rightarrow \pmb{x}^*\), and the equation becomes
where \(\pmb{x^*}\) is the sulution of the equation to be solved.
If the above thought gets right, then we can consider the converging speed of the iterative process, which makes great sense in practical applications. That is, in a given definition of distence,
to be small as much as possible for each \(k\).
Preliminary: Errors¶
If a real number \(x\) is denoted as \(0.d_1d_2d_3\cdots \times 10^{n}\), then
- Truncation(截断) Error
is induced when
for some definite \(k<\infty\)
- Roundoff(舍入) Error
is induced when
for some definite \(k<\infty\)
where \(\delta_k >d_k\) if \(d_{k+1}>=5\).
t significant digits¶
The number \(p^*\) is said to approximate p to \(t\) significant digits(or figures) if \(t\) is the largest nonnegative integer for which the relative error
where \(p^*\) is the approximate number of the exact number \(p\).
- for Chopping:
- for rounding:
Solutions of Equations in One Variables¶
the Bisection Method(二分法)¶
This method is quite intuitive. By choosing two end points \(a, b\), we get another point (Mid-point here)
Then update \(a, b\) by evaluating whether \(f(p)>0\) or not to narrow down the interval.
What is interesting is the stopping procedure. Readers can see the following question if interested.
When we calculate the new point \(p\) based on \(a, b\), we need to judge whether \(p\) is an appropriate answer. Apart from \(f(p)=0\), which condition do you think is the best?
- \((b-a)/{|\min{(a, b)}|}<\epsilon\)
- \(|p-p_{prev}|=(b-a)/2 < \epsilon\)
- \(f(p)<\epsilon\)
- 1
- 2
- 3
Choose 1, which is close to relative error, currently the best.
2: consider \(\{p_n\}=\sum\limits_{i=1}^{n}\frac{1}{k}\).
3: easy to see.
The converging speed can discribed as the following:
Fixed-Point Iteration(不动点法)¶
As we said previously in Basic ideas for solving equation, we hope to find an iterative relation such that the converging point \(x^*\) is exactly what we want, which in this case, means that
So intuitively, we ask: Whether can we derive a relation from
to
for iterative method?
The answer is, of course, YES!
One of the easist way to transform is adding \(x\) to both sides of equation \(\ref{zero-equation}\), but in most cases this does not work. Because we rely on \(f(x)\) ifself for the convergence!
Thus, it is necessary to find the condition for \(x = g(x)\) to converge. The following theorem
gives a Sufficient condition.不动点存在定理 | Fixed-Point Theorom
Let \(g \in C[a, b]\) be such that \(g(x) \in [a, b]\), for all \(x\) in \([a, b]\). Suppose, in addition, that \(g'\) exists on \((a, b)\) and that a constant \(0 < k < 1\) exists with
Then for any initial number \(p_0 \in [a, b]\), the sequence \(\{p_n\}_{n=0}^{\infty}\) defined by
converges to the unique fixed point \(p \in [a, b]\).
Proving it is easily.
- using the differential mean value theorem.
\(\forall n \geq 1, \exists \zeta_n \in (p_{n-1}, p) \subset (a, b)\), we have
by induction, we have
Let \(n \rightarrow \infty\), \(|p_n-p| \rightarrow 0\), that is, \(p_n\) converges to \(p\).
What we use in the proof will benefit us in identifying the speed of converging process.
Newton's Method(牛顿法)¶
This method is also a fixed-point method. There are two perspectives to get the inspirations.
- version1: shrink the derivative of the iterative function \(g(x)\).
- version2: using Taylor's expansion.
We can know that given a random function \(f(x)\), it may not be convergent to some point \(x^*\) for \(x^{k} = f(x^{k-1}) + x^{k-1}\) in a given interval. So the queation is, can we formulate a function \(g(x)\) such that \(x^{k} = g(x^{k-1})\) is convergent?
The answer is, again, YES!
The following content tells us we can formulate \(g(x)= x - f(x)/(f'(x))\) such that \(g'(x) < 1\) in a given interval.
Readers can easily see that
if we add some constrictions, it can be easy to make \(g'(x)<1\).
Here we make use of the Taylor's expansion(or the derivatives) of the goal function.
Suppose that \(f \in C^2[a, b]\), Let \(x_0 \in [a, b]\) be an approximation to \(x^*\) such that \(f(x^*) \neq 0\) and \(|x_0-x^*|\) is "small". Consider the first Taylor polynomial for \(f(x)\) expanded at \(x_0\):
If we let \(x = x^*\), and according to \(f(x^*)=0\), we get
neglecting the square item, we get
to represent \(x^*\), we get
Then we can define the iterative relation as
The following statement guarrantees the convergence of the above iterative method.
牛顿法收敛条件 | conditions for convergence of Newton's method
Let \(f \in C^2[a, b]\). If \(p \in (a, b)\) is such that \(f (p) = 0\) and \(f'( p) \neq 0\), then there exists a \(\delta > 0\) such that Newton’s method generates a sequence \(\{p_n\}_{n=1}^{\infty}\) converging to \(p\) for any initial approximation \(p_0 \in [p − \delta, p + \delta]\).
Prove it.
- make use of the condition \(f(p) = 0\) and \(f'(p)\neq 0\)
for \(x \in (a, b)\), we aim to find a narrower interval \((x^*-\delta, x^*+\delta)\) to have \(g(x)\) map into itself. That is,
Firstly, \(f'(p)\neq 0\) implies that \(\exists \delta_1 >0\) such that \(f'(x)\neq 0, \forall x \in [x^* - \delta, x^*+\delta]\subset [a, b]\).
THus, we have
capable of dividing non-zero numbers.
Since \(f\in C^2[a,b]\), we have \(g' \in C^1[x^*-\delta_1, x^*+\delta_1]\)
for the exact solution \(x^*\), we have \(f(x^*)=0\), so
which implies that \(\exists 0<\delta < \delta_1\), such that
By differential Mean Value Theorem, for \(x \in [x^*-delta, x^*+\delta], \exists \zeta \in [x, x^*]\) such that
which means that \(g\) maps into itself. By Fixed-Point Theorom, the sequence defined by Newton's method converges.
Secant Method¶
It may not be easy to find derivarive of function \(f\), so we can use difference instead. That is, we have to store two adjacent points for calculating differnce
generate \(p_{k+1}\) using the above approximation and iterate.
Order of Convergence¶
So how to identify the speed of convergence? The following definition gives a glimpse.
Suppose \(\{p_n\}_{n=1}^{\infty}\) is a sequence that converges to \(p\), with \(p_n \neq p (\forall n)\). If positive constants \(\lambda\) and \(\alpha\) exist with
then \(\{p_n\}_{n=0}^{\infty}\) converges to \(p\) of order \(\alpha\), with asymptotic error constant \(\lambda\).
- (i) If \(\alpha=1 (\lambda<1)\), the sequence is linearly convergent.
- (ii) If \(\alpha=2\), the sequence is quadratically convergent.
The following theorem gives a sufficient condition for linear convergence.
线性收敛的充分条件 | sufficient condition of linear convergence
Let \(g \in C[a, b]\) be such that \(g(x) \in [a, b], \forall x \in [a, b]\). Suppose, in addition, that \(g\) is continuous on \((a, b)\) and a positive constant \(k < 1\) exists with
If \(g'(p) \neq 0\), then for any number \(p_0=p\) in \([a, b]\), the sequence
converges only linearly to the unique fixed point \(p\) in \([a, b]\).
Prove it.
Prove that linear convergence represents \(\exists alpha=1, lambda\), such that \(\lim\limits_{n\leftarrow \infty}\frac{|p_{n+1}-p^*|}{|p_{n}-p^*|} = \lambda\).
multiple roots¶
We see that the speed of convergence is limited by multiple roots.
Here we have modified Newton's Method:
Accelerating convergence¶
Aitken's \(\Delta^2\) Method¶
Suppose \(\{p_n\}_{n=0}^{\infty}\) is a linearly convergent sequence with limit \(p\). To motivate the construction of a sequence \(\{\hat{p}_n\}_{n=1}^{\infty}\) that converges more rapidly to \(p\) than does \(\{p_n\}_{n=0}^{\infty}\), let us first assume that the signs of \(p_n-p\), \(p_{n+1}-p\) and \(p_{n+2}-p\) agree and that \(n\) is sufficiently large that
Then solving for \(p\) gives
And to get \(p_n\) out gives
Steffensen's Method¶
The following thought is based on that the generated sequence \(\hat{p}\) is a better approximation to true \(p^*\). We make use of the constructed sequence \(\{\hat{p}_n\}\) to update the original sequence \(\{p_n\}\). That is, after generating a new \(\hat{p}\), we can update \(p_0 \leftarrow p\).
Interpolation and Polynomial Approximation¶
Lagrange Interpolating Polynomial¶
Inspired by 加权平均.
There exists and only exists a \(n\)th Lagrange interpolating polynomial (拉格朗日基函数)
such that for each pair of given points \((x_i, y_i)\), \(i = 0, 1,2,\cdots n\), we have \(y_i = L_n(x_i)\).
Here we can consider \(l_i(x)\) as a base of a linear space \(\mathcal{P}_n(x)\), and it can be displayed by natural base \(1, x, x^2, \cdots x^n\). To be more specific,
readers can prove the above \(n+1\) polynomials are linearly irrelevant.
In fact, if we assume \(P_n(x) = \sum\limits_{i=0}^{n}a_ix^i\)(natural base), and to get the parameters \(\{a_i\}\) such that \(P_n(x_i) = y_i\), we have to solve the following linear system
which is a little tedious.
Error¶
The following theorem gives the error bound.
拉格朗日基函数的余项 | The remainder of Lagrange interpolating polynomial
Suppose \(x_0, x_1, \cdots , x_n\) are distinct numbers in the interval \([a, b]\) and \(f \in C^{n+1{[a, b]\). Then, for each \(x \in [a, b]\), a number \(\xi(x)\) (generally unknown) between \(x_0, x_1, \cdots , x_n\), and hence in \((a, b)\), exists with
where \(P(x)\) is the Lagrange interpolating polynomial.
Prove it.
- Using a function
which is \(C^{n+1}[a, b]\). Note that \(f(x_k)=0\) and \(f(x)=0\) for \(x \in [a, b]\). We can see that \(n+1+1=+\) zero points here.
- Using generalized Rolle's Theorem: There exists \(\xi(x) \in [a, b]\) such that \(g^{n+1}(\xi(x))=0\)
And make \(n+1\) derivatives to \(g(t)\)(corresponding to \(t\)), and we can get the result.
Compared to the remainder of Taylor's extension
Because \(\xi(x)\) is usually unknown, so we often use a number \(x' \in [a, b]\) such that
Neville's Method¶
An interpolation polynomial for a set of points \(A =\{x_0, x_1, \cdots x_n \}\) can be expressed by two polynomials that interpolate \(A\) \ \(\{ x_i\}\) and \(A\) \ \(\{ x_j\}\). That is,
where \(P_{0,1,\cdots,i-1,i+1,\cdots,n}(x)\) and \((x-x_j)P_{0,1,\cdots,j-1,j+1,\cdots,n}(x)\) denotes the polynomial that interpolates \(A\) \ \(\{ x_i\}\) and \(A\) \ \(\{ x_j\}\) respectively.
Divided Differences¶
If we rewrite the \(n\)th Lagrange polynomial \(P_n(x)\) into another form:
By letting \(x =x_0, x_1,\cdots, x_n\), we get
Then we can define:
We can prove the above definition \(f[x_0,x_1\cdots,x_n]\), which is called divided difference, to be equal to \(a_n\) by induction.
This iterative method is quite useful in determining the parameters of \(n\)th Lagrange polynomial.
Some properties are the followings.
差分与微分均值的关系 | Relationship of DD & Maen Value Differentials
Suppose that \(f \in C^n[a, b]\) and \(x_0, x_1, \cdots, x_n\) are distinct numbers in \([a, b]\). Then a number \(\xi\) exists in \((a, b)\) with
It is like
but add \(n!\) to the denominator.
The following relation gives a slightly quicker way to compute \(f[x_0,x_1,\cdots,x_n]\):
Another expression of calculating divided difference
where
Hermite Polynomials¶
We want to consider the smoothness of interpolating polynomials, so we need to consider its derivatives, especially the first order derivative.
Definition of Osculating Polynomial
The osculating polynomial approximating \(f\) is the polynomial \(P(x)\) of least degree such that
Where \(n\) is the total number of sampling points and \(m_i\) is the degree of smoothness at point \(x_i\).
If \(m_i=1\) for all \(i=0,1,\cdots, n\), then the above polynomial is called Hermite Polynomial.
Composition of Hermite Polynomial
If \(f \in C^1[a, b]\) and \(x_0, \cdots , x_n \in [a, b]\) are distinct, the unique polynomial of least degree agreeing with \(f\) and \(f'\) at \(x_0,\cdots , x_n\) is the Hermite polynomial of degree at most \(2n + 1\) given by
where, for \(L_{n, j}(x)\) denoting the \(j\)th Lagrange coefficient polynomial of degree \(n\), we have
The composition is usually a little tidious.
Cubic Spline Interpolation¶
We change \(\frac{d^kP(x_i)}{dx^k} = \frac{d^kf(x_i)}{dx^k}\) into equations between adjacent curves. Because we ofte do not know the derivatives of the point.
- Natural Spline
Let \(S''(a) = 0\) and \(S''(b) = 0\).
- Clamped Cubic Spline
Let \(S'(x_0) = f'(x_0)\) and \(S'(x_n) = f'(x_n)\).
Approximation Theory¶
Discrete Least Squares Approximation¶
Orthogonal Polynomials and Least Squares Approximation¶
If we choose
we can easily solve the approximation polynomial \(\sum\limits_{i=0}^\inftya_ix^i\) by letting \(\frac{\partial E}{\partial a_k}=0\), and get
Chebyshev Polynomials and Economization of Power series¶
Coonsider \(n+1\) extreme points of \(cosn\theta\) on \([0,\pi]\).
We still have \(cos(n \theta)=\sum\limits_{k=1}^n a_kcos(\theta)^i\).
Define \(T_0(x)=1\), \(T_1(x)=x\), \(T_{n+1}(x)=2xT_n(x)-T_{n-1}(x)\), \(n=1,2,\cdots,n-1\)