Skip to content

Markov Process & chain

Definition of Markov Process

Assume \(\{X(t): t\in T\}\) is a stochastic process. If \(\forall n\), \(t_i\in T\), \(i=1,2,\cdots,n\), and \(t_1<\cdots<t_n\), we have state \(x_1,\cdots,x_n\in S\), the conditional probability \(X(t_n)\) satisfies

\[ \begin{align*} P&(X(t_n)<x_n\mid X(t_0)=x_0,\cdots X(t_{n-1})=x_{n-1})\\&=P(X(t_n)<x_n\mid X(t_{n-1})=x_{n-1}),\quad x_n\in \mathbb{R}, \end{align*} \]

then \(X(t)\) is called Markov Process.

When the state space \(S\) is discrete, then we call the above process Markov Chain.

When both the state space \(S\) and time space \(T\) are discrete, then the above process is called discrete Markov Chain.

Transition Probability Distribution

We call the following conditional distribution

\[ F(t_{n-1}, x_{n-1}; t_n, x_n)=P(X(t_n)<x_n\mid X(t_{n-1})=x_{n-1}). \]

trransition probability distribution.

For discrete Markov chain, \(\forall n,m \in T\), \(i,j \in S\), we define transition probability

\[ p_{ij}(m,n)=P(X(n)=j\mid X(m)=i) \]

which means the probability of transiting from state \(i\) at time \(m\) to state \(j\) at time \(n\). Apparently, we have \(p_{ij}(m,n)\geq 0\) and

\[ \sum_{j=1}^\infty p_{ij}(m,n)=1. \]

Define transition matrix of \(n\) steps

\[ P(m,m+n)= [p_{ij}(m, m+n)]_{|S|\times |S|} \]

Specifically, we focus on transition matrix of one step. If \(P(m,m+1)\) is irrelevant with \(m\), then we call the above Markov chain time-homogeneous. So we denote

\[ p_{ij}=p_{ij}(1)=p_{ij}(m,m+1). \]

C-K Equation

Chapman-Kolmogorov Equation

Assume \(\{X_n\}\) is a discrete Markov Chain, and its state space is \(S\). Then \(\forall h,l\in \mathbb{N}^+\), and \(\forall m\in \mathbb{N}\),

\[ p_{ij}(m,m+h+l)=\sum_{r=0}^\infty p_{ir}(m,m+h) \cdot p_{rj}(m+h,m+h+l). \]

which is called Chapman-Kolmogorov Equation.

By definition,

\[ \begin{align*} p_{ij}&(m,m+h+l)\\ &=P(X_{m+h+l}=j\mid X_m=i)\\ &=\frac{P(X_{m+h+l}=j, X_m=i)}{P(X_m=i)}\\ &=\frac{\sum_{r=0}^\infty P(X_{m+h+l}=j, X_{m+h}=r, X_m=i)}{P(X_m=i)}\\ &=\sum_{r=0}^\infty \frac{P(X_{m+h+l}=j, X_{m+h}=r, X_m=i)}{P(X_{m+h}=r, X_m=i)}\frac{P(X_{m+h}=r, X_m=i)}{P(X_m=i)}\\ &=\sum_{r=0}^\infty P(X_{m+h+l}=j\mid X_{m+h}=r, X_m=i)\cdot P(X_{m+h}=r\mid X_m=i)\quad \text{Totel probability formula}\\ &=\sum_{r=0}^\infty P(X_{m+h+l}=j\mid X_{m+h}=r) P(X_{m+h}=r\mid X_m=i)\quad\text{By Markov property}\\ &=\sum_{r=0}^\infty p_{ir}(m,m+h) \cdot p_{rj}(m+h,m+h+l). \end{align*} \]

Corollary: Multiple transition Probability

Transition probablity of one step of a discrete Markov chain could determine the transition probability of multiple steps.

By induction. For \(h=2\), we have

\[ p_{ij}(m,m+2)=\sum_{r=0}^\infty p_{ir}(m,m+1)\cdot p_{rj}(m+1,m+2). \]

Assume for \(h=k\), the theorem holds. Then for \(h=k+1\), we have

\[ p_{ij}(m,m+k+1)=\sum_{r=0}^\infty p_{ir}(m, m+k)\cdot p_{rj}(m+k, m+k+1). \]

If we write a matrix, then

\[ P(m,m+k+1)=P(m,m+k)P(m+k,m+k+l). \]

Note here is the premultiplication of matrices.

Determination of the distribution of discrete Markov Chain

Assume \(\{X_n\}\) is a discrete Markov Chain, and its state space is \(S\). The finite distribution of \(X_n\) is determined if all the transition probability of one step \(p_{ij}(m,m+1)\) and the initial distribution \(p_i=P(X_0=i)\) is determined.

\(\forall k\in \mathbb{N}^+\), \(\forall 0<m_1<\cdots<x_k\in T\), and \(i_1,\cdots,i_k\in S\), we have

\[ \begin{align*} P(&X_{m_k}=i_k,\cdots,X_{m_1}=i_1)\\ &=\sum_{i=0}^\infty P(X_{m_k}=i_k,\cdots,X_{m_1}=i_1, X_{0}=i)\\ &=\sum_{i=0}^\infty P(X_{m_k}=i_k\mid X_{m_{k-1}}=i_{k-1}, \cdots, X_{m_1}=i_1, X_{0}=i)\cdot P(X_{m_{k-1}}=i_{k-1},\cdots,X_{m_1}=i_1, X_{0}=i)\\ &=\sum_{i=0}^\infty p_{i_{k-1}i_{k}} (m_{k-1}, m_k) \cdot P(X_{m_{k-1}}=i_{k-1} \mid X_{m_{k-2}}=i_{k-2}\cdots,X_{m_1}=i_1, X_{0}=i)\\ &=\sum_{i=0}^\infty p_i \cdot p_{i i_1}(0,m_1)\cdot \prod_{j=1}^{k-1} p_{i_j i_{j+1}}(m_{j}, m_{j+1}) \end{align*} \]

State Space

Now we focus on time-homogeneous discrete Markov Chain.

Invariance of Time

Assume \(X_n\) is a time-homogeneous discrete Markov Chain, then its transition probability of \(h\) steps

\[ p_{ij}(m,m+h) \]

is irrelevant with time \(m\).

Use C-K equation

\[ \begin{align*} p_{ij}(m,m+h)&=\sum_{r_1=0}^\infty p_{ir_1}(m,m+1)\cdot p_{r_1j}(m+1,m+h)\\ &=\sum_{r_1=0}^\infty p_{ir_1} \sum_{r_2=0}^\infty p_{r_1r_2}\cdot p_{r_2j}(m+2,m+h)\\ &=\sum_{r_1=0}^\infty p_{ir_1}\sum_{r_2=0}^\infty p_{r_1r_2}\cdots \sum_{r_{h-1}=0}^\infty p_{r_{h-1}r_{h}} \end{align*} \]

which is only dependent on steps \(h\). This is similar to stationary increment property.

Accessibility & Communicating Class

Accessibility of state

If there exsits \(N>0\) such that \(p_{ij}(N)>0\), then we call that state \(i\) could access state \(j\), denoted by

\[ i\rightarrow j. \]

If \(i\rightarrow j\) and \(j\rightarrow i\), then we call \(i\), \(j\) communicate, denoted by

\[ i\leftrightarrow j. \]

Properties of accessibility

(i) If \(i\rightarrow j\) and \(j\rightarrow k\), then \(i\rightarrow k\).

(ii) If \(i\leftrightarrow j\), then \(j\leftrightarrow i\).

(iii) If \(i\leftrightarrow j\) and \(j\leftrightarrow i\), then \(i\leftrightarrow i\).

(i) There exsits \(N_1,N_2>0\), s.t. \(p_{ij}(N_1)>0\) and \(p_{jk}(N_2)>0\), then by C-K equation

\[ p_{ik}(N_1+N_2)=\sum_{r=0}^\infty p_{ir}(N_1)p_{rk}(N_2)\geq p_{ij}(N_1)p_{jk}(N_2)>0. \]

Definition of Communicating class

We denote \(C(j)\) to be the communicating class of state \(j\), which is actually the collection of all states mutually accessible from \(j\).

If \(C(j)=\varnothing\), then we call \(j\) a return state.

If \(C(j)\neq \varnothing\), then we call \(j\) a non-return state.

In further study, we would see a non-return state must be a transient state, but a return state should be decomposed carefully.

Property of communicating class

If \(i\leftrightarrow j\), then \(C(i)=C(j)\).

Easy to see. \(\forall k\in C(i)\), we have \(k\leftrightarrow i\). Since \(i\leftrightarrow j\), we have \(k\leftrightarrow j\), so \(k\in C(j)\). The other direction is the same.

Closed state class

We call set of state \(C\) a closed state set, if for all state \(j\not\in C\), and \(i\in C\), \(j\not\leftrightarrow i\).

If a closed state set has only one element \(j\), then we call \(j\) an absorbing state.

If any two states of a state space \(S\) communicate, then we call the Markov chain irreducible.

Theorem for communicating class

If two state set \(C_1\), \(C_2\) are communicating classes, then either \(C_1=C_2\) or \(C_1\cap C_2=\varnothing\).

By definition of communicating class, there exists \(i\), \(j\) such that

\[ C_1=C(i),\quad C_2=C(j). \]

If \(C_1\cap C_2\neq \varnothing\), then choose one \(g\in C_1\cap C_2\), which satisfies \(g\leftrightarrow i\) and \(g\leftrightarrow j\). So

\[ i\leftrightarrow j. \]

which means \(C_1=C(i)=C(j)=C_2\) by Property of communicating class.

decomposition of Markov Chain

The state space \(S\) of Markov chain could be decomposed into denumerable mutually disjoint state sets, i.e.

\[ S=C_1\cup C_2\cup \cdots\cup C_n, \]

where \(C_i\cap C_j=\varnothing\) whenever \(i\neq j\).

Iterate over all elements of \(S\). Arbitrarily choose a state \(j_1\in S\), let \(C_1=C(j_1)\), if \(j_1\) is a recurrent state; otherwise let \(C_1=\{j_1\}\), i.e. \(j_1\) is a transient state. Next round, Arbitrarily choose a state \(j_2\in S-C_1\), let \(C_2=C(j_2)\), if \(j_2\) is a recurrent state; otherwise let \(C_2=\{j_2\}\), i.e. \(j_2\) is a transient state.

Continue until \(S-C_1-C_2-\cdots=\varnothing\).

Period of State

Now we focus on recurrent state, other than absorbing state or transient state, since both of which does not have periodical phenomenon.

Definition of period of state

We denote \(d(j)\) to be the period of a recurrent state \(j\), defined by

\[ d(j)=gcd\{n: p_{jj}(n)>0, n\geq 1\}. \]

if \(d(j)=1\), we call \(j\) non-periodical state, which includes the aborbing state \(k\), whose \(d(k)=1\).

Usually, if state \(i\) has \(d(i)=d\), then for all \(n\neq 0(\text{mod } d)\), we have \(p_{ii}(n)=0\). Conversely speaking, if \(n\equiv 0(\text{mod } d)\), \(p_{ii}(n)>0\) do not hold necessarily.

To be elaborate, we have the following theorem.

Theorem for period

If state \(i\) has a period \(d\), then there exsits \(M\in\mathbb{N}^+\), such that for all \(n>M\),

\[ p_{ii}(nd)>0. \]

Transitivity of period

If \(i\leftrightarrow j\), then \(d(i)=d(j)\).

Use a special loop. Since \(i\leftrightarrow j\), so there exsits \(n,m>0\) such that \(p_{ij}(n)>0\) and \(p_{ji}(m)>0\), so

\[ p_{ii}(n+m)\geq p_{ij}(n)p_{ji}(m)>0. \]

So \(d(i)\mid n+m\).

By definition of \(d(j)=gcd\{n_1,n_2,\cdots,n_k,\cdots\}\), we have \(p_{jj}(n_k)>0\) for all \(k=1,\cdots\). So for each \(k\),

\[ p_{ii}(n+m+n_k)\geq p_{ij}(n)p_{jj}(n_k)p_{ji}(m)>0. \]

So \(d(i)\mid n+m+n_k\). Combined with the above two, we have \(d(i)\mid n_k\), which means \(d(i)\mid d(j)\).

Similarly we have \(d(j)\mid d(i)\).

Ergodicity of State

Here we dig deeper into the probability itself for recurrent states. Notice some recurrent state is possible not to return, while others is always possible to return.

Definition of passage probability of State

Define the first-passage probability from state \(i\) to \(j\) after \(n\) steps.

\[ f_{ij}(n)=P(X_n=j,X_{n-1}\neq j,\cdots, X_1\neq j \mid X_0=i). \]

Define Ultimate probability from state \(i\) to \(j\)

\[ f_{ij}=P(X_m=j,\exists m\mid X_0=i). \]

Note that

\[ f_{ij}=\sum_{n=0}^\infty f_{ij}(n), \]

which is the core of our analysis.

We define

\[ T_j=\inf\{n\geq 1: X_n=j\} \]

to be First Hitting Time of \(j\).

Then

\[ P(T_j=n\mid X_0=i)=f_{ij}(n). \]

and

\[ P(T_j<\infty\mid X_0=i)=f_{ij}. \]

Notice that the above "\(<\infty\)" means the chain could reach state \(j\) from \(i\) in finitely many steps.

Theorem for relationship between first-passage probability and passage probability

For any state \(i,j\in S\), \(\forall n\geq 1\),

\[ p_{ij}(n)=\sum_{m=1}^n f_{ij}(m) p_{ij}(n-m). \]
\[ \begin{align*} p_{ij}(n)&=P(X_n=j\mid X_0=i)\\ &=\sum_{m=1}^n P(X_n=j,X_m=j,X_{m-1}\neq j,\cdots, X_1\neq j\mid X_0=i)\quad \text{This is a partition}\\ &=\sum_{m=1}^n \frac{P(X_n=j,X_m=j,X_{m-1}\neq j,\cdots, X_1\neq j, X_0=i)}{P(X_0=i)}\\ &=\sum_{m=1}^n P(X_n=j\mid X_m=j,X_{m-1}\neq j,\cdots, X_1\neq j, X_0=i)\cdot P(X_m=j, \cdots,X_1\neq j \mid X_0=i)\\ &=\sum_{m=1}^n p_{jj}(n-m) f_{ij}(m). \end{align*} \]

Definition of Recurrent State, self-reachable

If \(f_{ii}=1\), or \(P(T_i<\infty\mid X_0=i)=1\), then state \(i\) is called recurrent state.

If \(f_{ii}<1\), or \(P(T_i<\infty\mid X_0=i)<1\), then state \(i\) is called transient state.

For recurrent state \(i\), we have \(\sum_{n=1}^\infty f_{ii}(n)=1\). So \(f_{ii}(n)\) becomes a discrete distribution.

Now we could talk about the mathematical characteristics of the above distribution and its practical meaning. That is, let \(T_i\) to be the steps (time) from state \(i\) to \(i\) at the first passage, which is actually first hitting time in Another version of first-passage probability. Then

\[ P(T_i=n)=f_{ii}(n), \quad n=1,2,\cdots \]

Definition of Mean Recurrence Time

Mean recurrence time is defined as ME of \(T_i\)

\[ \mu_i=E(T_i)=\sum_{i=1}^\infty nf_{ii}(n). \]

For recurrent state \(i\), if \(\mu_i<+\infty\), then \(i\) is called Positive recurrent state. If \(\mu_i=+\infty\), then \(i\) is called Null recurrent state.

A non-periodic positive recurrent state is also called Ergodic state.

Define \(r_0=1\), \(r_k=P(T_i>k\mid X_0=i)\), \(k\geq 1\), then

\[ \mu_i=E(T_i\mid X_0=i)=\sum_{k=0}^\infty r_k. \]

This could be explained by this. We use the definition of expectation. Since

\[ P(T_i>k\mid X_0=i)=\sum_{n=k+1}^\infty P(T_i=n\mid X_0=i)=\sum_{n=k+1}^\infty f_{ii}(n) \]

So

\[ \begin{align*} \sum_{k=0}^\infty r_k&=\sum_{k=0}^\infty P(T_i>k\mid X_0=i)\\ &=\sum_{k=0}^\infty \sum_{n=k+1}^\infty f_{ii}(n)\\ &=\sum_{n=0}^\infty n f_{ii}(n)\quad \text{by triangle choice}\\ &=\mu_i \end{align*} \]

Actually, it is really hard to calculate \(f_{ii}(n)\) in some complex Markov chain. So we have to give another description of recurrent state.

n-visit Probability

n-visit Probability

Define n-visit probability

\[ Q_{ij}(n)=P(\forall N\geq n, \exists \{m_i\}_{1\leq i\leq N}, m_i\geq 1, s.t. X_{m_i}=j\mid X_0=i) \]

which means the probability for the chain to pass \(i\) in at least \(n\) times with initial state \(i\). Then we have

\[ f_{ij}=Q_{ij}(1). \]

and define \(Q_{ij}:=\lim\limits_{n\rightarrow \infty}Q_{ij}(n)\).

Properties of transfinite probability

(i) Monotonically decreasing.

\[ Q_{ij}(1)\geq Q_{ij}(2)\geq \cdots \]

(ii) If \(Q_{ij}=1\), which means starting from state \(i\), we would reach state \(j\) infinitely many times. So by property (i), we have \(Q_{ij}(n)=1.\)

(iii) If \(Q_{ii}=1\), then \(Q_{ii}(1)=f_{ii}=1\), state \(i\) is a recurrent state.

Relationship between \(f_{ij}\) and \(Q_{ij}\)

For any state \(i,j\in S\), and parameter \(n\geq 1\), we have

(i) \(Q_{ij}(m+1)=f_{ij} Q_{jj}(m)\);

(ii) \(Q_{ij}=f_{ij}Q_{jj}\);

(iii) \(Q_{jj}=\lim\limits_{n\rightarrow \infty}(f_{jj})^n\).

(i) Still use first-passage to partition the space.

\[ \begin{align*} Q_{ij}(m+1)&=P(\forall N\geq m+1, \exists \{m_i\}_{1\leq i\leq N}, m_i\geq 1, s.t. X_{m_i}=j\mid X_0=i)\\ &=\sum_{n=1}^\infty P(\forall N\geq m, \exists \{m_i\}_{1\leq i\leq N}, m_i\geq n+1, s.t. X_{m_i}=j, X_n=j, X_{n-1}\neq j,\cdots, X_1\neq j \mid X_0=i)\\ &=\sum_{n=1}^\infty P(\forall N\geq m, \exists \{m_i\}_{1\leq i\leq N}, m_i\geq n+1, s.t. X_{m_i}=j \mid X_n=j )\cdot \\ &\quad P(X_n=j, X_{n-1}\neq j,\cdots, X_1\neq j \mid X_0=i)\quad \text{by Markov property}\\ &=\sum_{n=1}^\infty P(\forall N\geq m, \exists \{m_i\}_{1\leq i\leq N}, m_i\geq 1, s.t. X_{m_i}=j \mid X_0=j ) f_{ij}(n)\quad\text{by time-homogeneous}\\ &=\sum_{n=1}^\infty Q_{jj}(m)f_{ij}(n)=f_{ij}Q_{jj}(m). \end{align*} \]

(ii) By (i) and take a limit.

(iii) Apply (i) to \(Q_{jj}(m)\) iteratively.

\[ Q_{jj}(m)=f_{jj} Q_{jj}(m-1)=\cdots=(f_{jj})^{m-1} Q_{jj}(1)=(f_{jj})^{m}. \]

Recurrent state description with passage probability

(i) State \(j\) is a recurrent state, iff \(\sum\limits_{n=1}^\infty p_{jj}(n)=+\infty\).

(ii) State \(j\) is a transient state, iff \(\sum\limits_{n=1}^\infty p_{jj}(n)<+\infty\).

The proof is done by leveraging Theorem for relationship between first-passage probability and passage probability.

  • Sufficient for (i) and necessary for (ii).

We have

\[ \begin{align*} p_{jj}(n)&=\sum_{m=1}^n f_{jj}(m)p_{jj}(n-m)\\ \Rightarrow \quad \sum_{n=1}^N p_{jj}&=\sum_{n=1}^N \sum_{m=1}^n f_{jj}(m)p_{jj}(n-m)\\ &=\sum_{m=1}^N \sum_{n=m}^N f_{jj}(m)p_{jj}(n-m)\\ &=\sum_{m=1}^N f_{jj}(m) \sum_{n=m}^N p_{jj}(n-m)\\ &\leq \sum_{m=1}^N f_{jj}(m) \sum_{n=0}^N p_{jj}(n) \end{align*} \]

so we have

\[ \frac{\sum_{n=1}^N p_{jj}(n)}{p_{jj}(0)+\sum_{n=1}^N p_{jj}(n)}\leq \sum_{m=1}^N f_{jj}(m) \]

let \(N\rightarrow\infty\), we have

\[ 1\leq f_{jj} \]

which means \(f_{jj}=1\), so it is recurrent state. So if state \(j\) is transient, then it must have \(\sum\limits_{n=1}^\infty p_{jj}(n)<+\infty\).

  • Necessary for (i) and sufficient for (ii).

Still using the same formula, we choose a smaller one \(M\leq N\)

\[ \begin{align*} \sum_{n=1}^N p_{jj}(n)&=\sum_{m=1}^N f_{jj}(m)\sum_{n=0}^{N-m} p_{jj}(n)\\ &\geq \sum_{m=1}^M f_{jj}(m)\sum_{n=0}^{N-m} p_{jj}(n)\\ &\geq \sum_{m=1}^M f_{jj}(m)\sum_{n=0}^{N-M} p_{jj}(n) \end{align*} \]

So

\[ \sum_{m=1}^M f_{jj}(m)\leq \frac{\sum_{n=1}^{N} p_{jj}(n)}{p_{jj}(0) +\sum_{n=1}^{N-M} p_{jj}(n)} \]

let \(N\rightarrow \infty\), if \(\sum_{n=1}^{N} p_{jj}(n)<\infty\), then the right-hand limit is strictly less than \(1\), i.e.

\[ \sum_{m=1}^M f_{jj}(m)\leq \delta<1 \]

so let \(M\rightarrow \infty\), we have a contradiction. Actually, if \(\sum_{n=1}^{N} p_{jj}(n)<\infty\), then \(f_{jj}<1\), which means state \(j\) is transient state.

We could take another perspective. Notice

\[ \sum_{i=1}^\infty P(A_i)=E\sum_{i=1}^\infty 1_{A_i} \]

So the above summation in the ME is actually the total number of events \(A_1,\cdots,A_i,\cdots\) happening. In our Markov chain, we have

\[ N_j=\sum_{i=1}^\infty Y_i= \sum_{i=1}^\infty 1_{A_i} \]

to denote the number of time the chain reaches state \(j\) from initial state \(j\) (for steps iterating form \(1\) to \(\infty\)). \(A_i\) denotes the event that the chain reach state \(j\) from state \(j\) in \(n\) steps, \(Y_i\) denotes whether the chain reaches \(j\) in exactly \(i\) steps.

Note that \(A_i\) are not necessarily mutually disjoint.

So

\[ \sum\limits_{n=1}^\infty p_{jj}(n)=EN_j \]

If we write explicitely the initial state, then

\[ \sum\limits_{n=1}^\infty p_{ij}(n)=E(N_{ij} \mid X_0=i). \]

Corollary: Transitivity of Recurrent state

If state \(i\leftrightarrow j\), and \(j\) is a recurrent state, then \(i\) is also a recurrent state.

Since \(i\leftrightarrow j\), \(\exists m,n\geq 1\), such that

\[ p_{ij}(m)>0, \quad p_{ji}(n)>0. \]
\[ \begin{align*} \sum_{k=1}^\infty p_{ii}(k)&\geq \sum_{k=1}^\infty p_{ii}(n+m+k)\\ &\geq \sum_{k=1}^\infty p_{ij}(m)p_{jj}(k)p_{ji}(n)\\ &= p_{ij}(m)p_{ji}(n) \sum_{k=1}^\infty p_{jj}(k)=+\infty. \end{align*} \]

we are done.

\(\square\)

Limit Theorem

We now focus on the outer characteristics of recurrent, transient and ergodic state, the limit of \(p_{jj}(n)\).

The following theorem has considered the influence of period. If a positive recurrent state has a period, it might not have the limit of \(p_{jj}(n)\), but just a convergent subsequence. So a more general statement of the following (i) and (ii) is

\[ \limsup_{n\rightarrow\infty}p_{jj}(n)>0 \]

for positive recurrent state, without considering period.

Theorem for limit of passage probability of three states

(i) If state \(j\) is a non-periodic positive recurrent state (ergodic), then

\[ \lim_{n\rightarrow \infty}p_{jj}(n)=\frac{1}{\mu_j} \]

(ii) If state \(j\) is a positive recurrent state with period \(d\), then

\[ \lim_{n\rightarrow \infty}p_{jj}(nd)=\frac{d}{\mu_j}, \]

(iii) If state \(j\) is null recurrent state or transient state, then

\[ \lim_{n\rightarrow \infty}p_{jj}(n)=0. \]

We first define some variables.

Define \(N_j(n)\) to be the number of reaching \(j\) in \(\{X_1,\cdots,X_n\}\), also called Sojourn Time.

Define \(T_1\) to be the number of steps for reaching \(j\) at the first time. \(T_k\) to be the number of steps for reaching \(j\) at \(k\) times after reaching \(j\) at \(k-1\) times. So by time-homogeneous property, we have \(T_1,\cdots,T_k,\cdots\) are mutually independent and follow the same distribution, i.e. first hitting time \(T_j\), that is,

\[ P(T_k=n)=f_{jj}(n), \quad n=1,2,\cdots,\forall k=1,2,\cdots \]

and satisfies \(E(T_k)=\mu_j\).

Define accumulated time \(S_k=\sum_{i=1}^k T_i\). Then we have the relationship

\[ N_j(n)=\max\{k\geq 1: S_k \leq n\}. \]

or

\[ S_k=\min\{n\geq 1: N_j(n)=k\} \]

Note that the above relationship also holds in Possion Process.

  • Prove first \(\lim\limits_{n\rightarrow \infty} \frac{N_j(n)}{n}=\frac{1}{\mu_j}\)

Notice their relationship

\[ S_{N_j(n)}\leq n\leq S_{N_j(n)+1} \]

so

\[ \frac{S_{N_j(n)}}{N_j(n)}\leq \frac{n}{N_j(n)}\leq \frac{S_{N_j(n)+1}}{{N_j(n)+1}}\frac{{N_j(n)+1}}{N_j(n)}. \]

By Strong law of large number, we have \(\frac{S_n}{n}\rightarrow \mu_j, a.s.\), and since \(N_j(n)\rightarrow \infty\) as \(n\rightarrow \infty\) (recurrent state),

\[ \frac{n}{N_j(n)}\rightarrow \mu_j\quad a.s. \]

and we have

\[ \frac{N_j(n)}{n}\rightarrow \frac{1}{\mu_j}\quad a.s. \]
  • Prove second \(\lim\limits_{n\rightarrow \infty} \frac{1}{n}\sum_{m=1}^n p_{jj}(m)=\frac{1}{\mu_j}\).

By definition of sojourn time, we have

\[ N_j(n)=\sum_{m=1}^n Y_m,\quad Y_m\sim B(1,p_{jj}(m)) \]

Actually \(Y_m\) denotes whether chain reaches \(j\) in exactly \(m\) steps, if so, \(Y_m=1\), otherwise \(Y_m=0\). This is actually the same with another perspective for the description of recurrent state, i.e.

\[ N_j=\lim\limits_{n\rightarrow\infty} N_j(n),\quad N_{ij}=\lim\limits_{n\rightarrow\infty} N_{ij}(n). \]

So

\[ E\frac{N_j(n)}{n}=\sum_{m=1}^n \frac{EY_m}{n}=\frac{1}{n}\sum_{m=1}^n p_{jj}(n), \]

take the limit and we have the result.

  • Prove the proposition.

Easy to check, just assume \(\lim_{n\rightarrow \infty}p_{jj}(n)=\frac{1}{\mu_j}+\delta\), substitute in the above result and we have \(\delta=0\).

\(\square\)

Let \(n=dn'\), we could make \(X_{dn'}=X_{n}\) a non-periodical Markov chain. So by (i)

\[ \lim_{n\rightarrow\infty}p_{jj}(n)=\lim_{n'\rightarrow\infty}p_{jj}(n')=\frac{1}{\mu'}. \]

Notice that

\[ \mu'=\sum_{n'=1}^\infty n'f_{jj}(dn')=\frac{1}{d}\sum_{n'=1}^\infty n'd f_{jj}(n'd)=\frac{\mu}{d}. \]

and we are done.

\(\square\)

  • For null recurrent state, by (ii), let \(\mu_j\rightarrow \infty\), and we have \(\lim\limits_{n\rightarrow \infty}p_{jj}(nd)=0\). Since \(p_{jj}(n)=0\) for \(n\neq\equiv 0(\text{ mod } d)\), so by Haine Theorem,
\[ \lim\limits_{n\rightarrow \infty}p_{jj}(nd)=0. \]
  • For transient state, just by convergence of \(\sum_{n=1}^\infty p_{jj}(n)\), we have each item \(\lim_{n\rightarrow \infty }p_{jj}(n)=0\).

    \(\square\)

Corollary: Transitivity of ergodic state

(i) If state \(i\leftrightarrow j\), and \(j\) is a positive recurrent state, then \(i\) is also a positive recurrent state.

(ii) If state \(i\leftrightarrow j\), and \(j\) is a zero recurrent state, then \(i\) is also a zero recurrent state.

By Transitivity of Recurrent state, we have \(i\leftrightarrow j\). So \(\exists m,n>0\), such that \(p_{ij}(m)>0\), and \(P_{ji}(n)>0\), so

\[ p_{ii}(m+n+kd)\geq p_{ij}(m)p_{jj}(kd)p_{ji}(n) \]

let \(k\rightarrow \infty\), we have

\[ \lim_{k\rightarrow\infty}p_{ii}(m+n+kd)\geq p_{ji}(n) p_{ij}(m)\lim_{k\rightarrow\infty} p_{jj}(kd) \]

If \(j\) is a positive recurrent state, then we have \(\lim\limits_{n\rightarrow\infty} p_{ii}(n)>0\) and we are done.

\(\square\)

Now the following theorem gives a limit situation for different initial states.

Theorem for transient state from different initial state

(i) If \(j\) is a non-periodic positive recurrent state, then \(\forall\) state \(i\),

\[ \lim_{n\rightarrow \infty}p_{ij}(n)=\frac{f_{ij}}{\mu_j}. \]

(ii) If \(j\) is a null recurrent state or transient state, then \(\forall\) state \(i\),

\[ \lim_{n\rightarrow \infty}p_{ij}(n)=0,\quad i\in S. \]

We have a lot of proof method, including using Lebesgue's Dominated convergence theorem. Here we give a squeeze method.

By Theorem for relationship between first-passage probability and passage probability, we have

\[ p_{ij}(n)=\sum_{m=1}^n f_{ij}(m)p_{jj}(n-m) \]

choose \(M<n\), we have

\[ \begin{align*} p_{ij}(n)&=\sum_{m=1}^M f_{ij}(m)p_{jj}(n-m) + \sum_{m=M+1}^n f_{ij}(m)p_{jj}(n-m)\\ &\leq \sum_{m=1}^M f_{ij}(m)p_{jj}(n-m) +\sum_{m=M+1}^n f_{ij}(m) \end{align*} \]

let \(n\rightarrow \infty\), and let \(M\rightarrow \infty\), we have

\[ \begin{align*} \lim_{n\rightarrow \infty}p_{ij}(n)&\leq \sum_{m=1}^M f_{ij}(m)\frac{1}{\mu_j} +\sum_{m=M+1}^\infty f_{ij}(m)\\ \lim_{M\rightarrow \infty}\lim_{n\rightarrow \infty}p_{ij}(n)&\leq \frac{f_{ij}}{\mu_j}+0. \end{align*} \]

For the other direction, we just leave out \(n-M\) number of summation items, and let \(M, n\rightarrow \infty\), we have

\[ \begin{align*} p_{ij}(n)&\geq \sum_{m=1}^M f_{ij}(m)p_{jj}(n-m)\\ \lim_{n\rightarrow \infty} p_{ij}(n)&\geq \sum_{m=1}^M f_{ij}(m) \frac{1}{\mu_j}\\ \lim_{M\rightarrow \infty}\lim_{n\rightarrow \infty}p_{ij}(n)&\geq \frac{f_{ij}}{\mu_j}. \end{align*} \]

So we are done.

\(\square\)

  • For null recurrent state.
\[ \begin{align*} p_{ij}(n)&=\sum_{m=1}^M f_{ij}(m)p_{jj}(n-m) + \sum_{m=M+1}^n f_{ij}(m)p_{jj}(n-m)\\ &\leq \sum_{m=1}^M f_{ij}(m)p_{jj}(n-m)+ \sum_{m=M+1}^n f_{ij}(m)\\ \Rightarrow \quad \lim_{n\rightarrow \infty} p_{ij}(n)&\leq 0+ \sum_{m=M+1}^\infty f_{ij}(m)\\ \lim_{M\rightarrow \infty}\lim_{n\rightarrow \infty}p_{ij}(n)&\leq 0 \end{align*} \]

and we are done.

\[ \sum_{n=1}^\infty p_{jj}(n)<\infty. \]

So by Theorem for relationship between first-passage probability and passage probability,

\[ \begin{align*} \sum_{n=1}^\infty p_{ij}(n) &= \sum_{n=1}^\infty \sum_{m=1}^n f_{ij}(m)p_{jj}(n-m)\\ &=\sum_{m=1}^\infty \sum_{n=m}^\infty f_{ij}(m)p_{jj}(n-m) \quad\text{triangle choice}\\ &=\sum_{m=1}^\infty f_{ij}(m) \sum_{n=m}^\infty p_{jj}(n-m)\\ &=\sum_{m=1}^\infty f_{ij}(m) \sum_{n=0}^\infty p_{jj}(n)\quad\text{by adjusting index}\\ &=f_{ij} \sum_{n=0}^\infty p_{jj}(n)\leq \sum_{n=0}^\infty p_{jj}(n)<\infty. \end{align*} \]

so \(\lim\limits_{n\rightarrow\infty}p_{ij}(n)=0\).

Limit distribution

Actually, we hope to find no matter whether the initial state is, after enough steps, after enough number of sampling, the state has a convergent distribution. That is, whether

\[ \lim_{n\rightarrow \infty}P(x_n=j) \]

exists?

We have the following necessary condition.

Existence for limit distribution

If \(\forall i\in S\), \(\exists\) a distribution \(\nu_{i}=\{\nu_{ij}\}_{j\in S}\) such that \(\lim\limits_{n\rightarrow \infty}p_{ij}(n)=\nu_{ij}\), then limit distribution exists.

By total probability formula

\[ \begin{align*} p_j(n)&=P(X_n=j)\\ &=\sum_{i=1}^\infty P(X_0=i)P(X_n=j\mid X_0=i)\\ &=\sum_{i=1}^\infty P(X_0=i) p_{ij}(n)\\ \end{align*} \]

So let \(n\rightarrow \infty\), and by LDCT we have

\[ \begin{align*} \lim_{n\rightarrow \infty} p_j(n)&=\lim_{n\rightarrow \infty} \sum_{i=1}^\infty P(X_0=i)p_{ij}(n)\\ &=\sum_{i=1}^\infty P(X_0=i)\lim_{n\rightarrow \infty}p_{ij}(n)\\ &=\sum_{i=1}^\infty P(X_0=i)\nu_{ij} \end{align*} \]

So define \(\mu_j=\lim\limits_{n\rightarrow \infty} p_j(n)=\sum\limits_{i=1}^\infty P(X_0=i)\nu_{ij}\), which is a distribution, since

\[ \sum_{j=1}^\infty \mu_j=\sum_{j=1}^\infty \sum_{i=1}^\infty P(X_0=i)\nu_{ij}=1. \]

So we have to know whether transition probability converges as steps go to infinity. In actual practice, this means calculation of \(\lim\limits_{n\rightarrow \infty}\pmb{P}^{(n)}=\lim\limits_{n\rightarrow \infty}\pmb{P}^n\).

Limit distribution is connected close to the following stationary distribution.

Definition of Stationary distribution

Assume there exists a distribution \(\{\pi_k\}_{k\geq 0}\) such that for all state \(k\in S\), we have

\[ \pmb{\pi}=\pmb{\pi}\pmb{P} \]

where \(\pi=[\pi_0,\pi_1,\cdots]\) is a row vector.

The above definition implies

\[ \pi_k=\sum_{j=0}^\infty \pi_j p_{jk},\quad k=0,1,\cdots \]

which also have a practical meaning

\[ \pi_k(1-p_{kk})=\sum_{j=0\atop j\neq k}^\infty \pi_j p_{jk},\quad k=0,1,\cdots \]

which means the probability of leaving \(k\) and entering \(k\) in one step are the same.

A Markov chain with an initial Stationary distribution is Strictly stationary process

A Markov chain with an initial Stationary distribution is strictly stationary process. That is, if

\[ p_k(0)=\pi_k,\quad k\geq 0 \]

where \(p_k(0)=P(X_0=k)\), \(\{\pi_k\}\) is a stationary distribution of the Markov chain, then for all \(n\geq 1\), we have

\[ p_k(n)=P(X_n=k)=\pi_k. \]

Since \(\pi_k=\sum_{j=0}^\infty \pi_j p_{jk}\), then

\[ p_{k}(1)=\sum_{j=0}^\infty p_j(0)p_{jk}=\sum_{j=0}^\infty \pi_j p_{jk}=\pi_k. \]

By induction, or Markov characteristic, we are done.

\(\square\)

Now we could talk about the relationship between limit distribution and stationary distribution.

Lemma: First-passage probability with transitivity

If state \(i\rightarrow j\), \(i\) is a recurrent state, then \(f_{ij}=1\) and \(j\rightarrow i\).

\(j\rightarrow i\) is apparent because after reaching \(j\), \(\exists n'>0\) such that \(p_{ji}(n')>0\). It is a little hard to prove \(f_{ij}=1\), i.e. we could reach \(j\) in finite many times.

A lot of proof could be used here. We give two versions. The first, in my opinion, is the simplest.

Assume \(X_0=i\), since \(i\rightarrow j\), \(\exists n>0\), such that \(p_{ij}(n)>0\). Now we make use of the properties of recurrent state \(i\). We check the number of failure for reaching \(j\).

If \(X_n\neq j\), then \(\exists T_1=\min\{k\geq n: X_k=i\}\) (we fail to reach \(j\), but afterwards we could still reach \(i\)). Check if \(X_{T_1+n}\neq j\), then \(\exists T_2=\min\{k\geq n+T_1: X_k=i\}\). Follow this routine, we define

\[ T_l=\min\{k\geq n+ T_{l-1} : X_k=i\},\quad l=1,2,\cdots \]

and \(T_0=0\) as default. Define \(\zeta=\#\{l\geq 0: X_{T_l+n}\neq j\}\), which is the number of failing reaching state \(j\) before success after \(T_l+n\) steps (if successfully reaching \(j\), then we stop the routine and return \(l\)). Then easy to see that

\[ P(\zeta=k)=(1-p_{ij}(n))^k p_{ij}(n) \]

(fail \(k\) times and success at the end). This random variable is reasonable, because \(i\) is recurrent. So \(\zeta\) follows geometric distribution and

\[ E\zeta=\frac{1-p_{ij}(n)}{p_{ij}(n)}<\infty. \]

which means in average, we fail in finite many times, and \(\zeta<\infty\) a.s. or \((P(\zeta=\infty)=0)\). So \(\exists T<\infty\) such that \(X_{T}=j\), which means \(f_{ij}=1\).

\(\square\)

Theorem: Relationship between stationary and limit distribution

Assume \(\{X_n\}\) is an irreducible Markov chain.

(i) it has a unique stationsary distribution, iff all of its state are positive recurrent, and

\[ \lim\limits_{n\rightarrow \infty}\frac{1}{n}\sum_{k=1}^n p_{ij}(k) = \pi_j. \]

(ii) if \(X_n\) is ergodic, then its stationary distribution equals the limit distribution, i.e. for all state \(i,j\),

\[ \lim\limits_{n\rightarrow \infty}p_{ij}(n)=\pi_j \]

which means it is also irrelevant with initial state.

  • Necessary.

Assume it has a unique stationary distribution, then

\[ \pi_k=\sum_{j=0}^\infty \pi_j p_{jk}=\sum_{j=0}^\infty \pi_j p_{ik}(n),\quad k\geq 1. \]

Let \(n\rightarrow \infty\), exchange the order of limit and summation, we have

\[ \pi_k=\lim_{n\rightarrow \infty}\sum_{j=0}^\infty \pi_j p_{ik}(n)=\sum_{j=0}^\infty \pi_j \lim_{n\rightarrow \infty} p_{ik}(n) =\sum_{j=0}^\infty \pi_j \cdot 0=0 \]

for null recurrent state and transient state, which contradicts! Since it is irreducible, then all the state is positive recurrent.

  • Sufficient. We focus on the limit of \(p_{ij}(n)\) in formulating stationary distribution.

Since \(i\leftrightarrow j\), by Lemma: First-passage probability with transitivity \(f_{ij}=1\). Since all the states are positive recurrent, by Theorem for transient state from different initial state

\[ \lim_{n\rightarrow \infty} p_{ij}(n)=\frac{f_{ij}}{\mu_j}=\frac{1}{\mu_j}, \]

We want to prove that \(\{\frac{1}{\mu_j}, j\geq 0\}\) is the stationary distribution.

Firstly, since \(\sum\limits_{j=0}^\infty p_{ij}(n)=1\), we want to take limit of \(n\) to get \(\frac{1}{\mu_j}\), i.e. by Fatou Lemma,

\[ \sum_{j=0}^\infty \frac{1}{\mu_j} =\sum_{j=0}^\infty \lim_{n\rightarrow \infty} p_{ij}(n)\leq \lim_{n\rightarrow \infty} \sum_{j=0}^\infty p_{ij}(n)=1. \]

Secondly, since

\[ p_{ij}(n+1)=\sum_{k=0}^\infty p_{ik}(n)p_{kj}(1) \]

still by Fatou lemma, we have

\[ \frac{1}{\mu_j}=\lim_{n\rightarrow \infty} \sum_{k=0}^\infty p_{ik}(n)p_{kj}(1)\geq \sum_{k=0}^\infty\lim_{n\rightarrow \infty} p_{ik}(n) p_{kj}(1)=\sum_{k=0}^\infty \frac{1}{\mu_k} p_{kj}. \]

Actually, for a stationary distribution, we shoule prove \(\frac{1}{\mu_j}=\sum_{k=0}^\infty \frac{1}{\mu_k} p_{ik}\). We show this by contradiction. If \(\exists j_0\) such that

\[ \frac{1}{\mu_{j_0}}>\sum_{k=0}^\infty \frac{1}{\mu_k} p_{kj_0} \]

then

\[ \sum_{j=0}^\infty \frac{1}{\mu_j}>\sum_{j=0}^\infty\sum_{k=0}^\infty \frac{1}{\mu_k} p_{kj}=\sum_{k=0}^\infty\frac{1}{\mu_k} \sum_{j=0}^\infty p_{kj}=\sum_{k=0}^\infty\frac{1}{\mu_k} \]

which contradicts!

Lastly, we prove that \(\{\frac{1}{\mu_j},j\geq 0\}\) is a distribution, i.e. \(\sum_{j=0}^\infty \frac{1}{\mu_j}=1\). With the same result, we have \(\frac{1}{\mu_j}=\sum_{k=0}^\infty \frac{1}{\mu_k} p_{kj} (m)\) (iteratively, proved by readers). So

\[ \begin{align*} \frac{1}{\mu_j} &= \lim_{m\rightarrow \infty} \sum_{k=0}^\infty \frac{1}{\mu_k} p_{kj} (m)\\ &= \sum_{k=0}^\infty \frac{1}{\mu_k} \lim_{m\rightarrow \infty} p_{kj} (m)\\ &= \sum_{k=0}^\infty \frac{1}{\mu_k} \frac{1}{\mu_j}\\ \Rightarrow \quad \sum_{j=0}^\infty \frac{1}{\mu_j} &= \sum_{k=0}^\infty \frac{1}{\mu_k} \sum_{j=0}^\infty \frac{1}{\mu_j}\leq \sum_{j=0}^\infty \frac{1}{\mu_j}. \end{align*} \]

Theorem for mean recurrence time

Assume \(X_n\) is a non-periodic irreducible Markov chain, then

\[ p_{ij}=\frac{1}{\tau_j},\quad j\in S, \]

where \(\tau_j=ET_j\) is the mean recurrence time of state \(j\).

Inverse Markov Chain

Check whether the following Markov property holds?

\[ P(X_{n}=j\mid X_{n+1}=i,X_{n+2}=i_{n+2},\cdots, X_N=i_N)=P(X_{n}=j\mid X_{n+1}=i) \]
\[ \begin{align*} P(X_{n}=j& \mid X_{n+1}=i,X_{n+2}=i_{n+2},\cdots, X_N=i_N)\\ &=\frac{p_j(n)p_{ji} p_{i i_{n+2}}\cdots p_{i_{N-1}i_{N}}}{p_{i}(n+1)p_{ii_{n+2}}\cdots p_{i_{N-1}i_{N}}}\\ &= \frac{P(X_n=j)}{P(X_{n+1}=i)}p_{ji}\\ &=P(X_{n}=j\mid X_{n+1}=i). \end{align*} \]

\(\square\)

Actually, we need some special condition to interpret.

The first is to making use of stationary distribution. That is, if the initial distribution is \(\pi\), then \(p_{j}(n)=\pi_j\), \(p_{i}(n+1)=\pi_i\), and

\[ \frac{P(X_n=j)}{P(X_{n+1}=i)}p_{ji}=\frac{\pi_j}{\pi_i}p_{ji} \]

which is irrelevant with time.

Now we have a new but natural idea of a special term: Inverse distribution.

Definition of inverse distribution

If

\[ (X_1,X_0)\overset{d}{=}(X_0,X_1), \]

i.e.

\[ \pi_i p_{ij}=\pi_j p_{ji} \]

then the whole Markov chain is inversible.

Actually, having a stationary distribution is a necessary condition for a Markov chain to be inverse.

Kolmogorov Criterion

Assume \(X_n\) is a irreducible stationary Markov chain, then it is inversible iff for all closed path

\[ i_0,i_1,\cdots,i_{m-1}, i_m=i_0 \]

we have

\[ p_{i_0i_1}p_{i_1i_2}\cdots p_{i_{m-1}i_0}=p_{i_0 i_{m-1}}\cdots p_{i_2i_1}p_{i_1i_0}. \]
  • necessary.

If \(X\) is inversible, then \(\forall i,j\in S\), \(\pi_ip_{ij}=\pi_jp_{ji}\). So

\[ \begin{align*} \pi_{i_0} p_{i_0i_1}p_{i_1i_2}\cdots p_{i_{m-1}i_m}&=\pi_{i_1} p_{i_1i_0}p_{i_1i_2}\cdots p_{i_{m-1}i_m}\\ &=p_{i_1i_0}\pi_{i_1} p_{i_1i_2}\cdots p_{i_{m-1}i_m}\\ &=p_{i_1i_0} p_{i_2i_1}\pi_{i_2}\cdots p_{i_{m-1}i_m}\\ &=p_{i_1i_0} p_{i_2i_1}\cdots \pi_{i_{m-1}} p_{i_{m-1}i_m}\\ &=p_{i_1i_0} p_{i_2i_1}\cdots p_{i_m i_{m-1}} \pi_{i_{m}}\\ &=p_{i_1i_0} p_{i_2i_1}\cdots p_{i_0 i_{m-1}} \pi_{i_{0}}\\ &=\pi_{i_0}p_{i_0 i_{m-1}}\cdots p_{i_2i_1}p_{i_1i_0}. \end{align*} \]
  • Sufficient. This is similar to a curve integral.

Fix a state \(j_0\), by assumption, for any state \(j\), there must be a path \(j, j_n, j_{n-1},\cdots, j_{1},j_0\) such that

\[ p_{jj_n}p_{j_nj_{n-1}}\cdots p_{j_1j_0}>0 \]

and then we could define

\[ \pi_j=\frac{1}{Z}\frac{p_{j_0j_1}\cdots p_{j_{n} j}}{p_{j_n j}\cdots p_{j_1j_0}} \]

where \(Z\) is made for \(\sum_{j}\pi_j=1\). Then it is easy to show that the above definition is well-defined, and for stationary distribution, we have to show for \(j,k \in S\),

\[ \pi_jp_{jk}=\pi_kp_{kj}. \]

If \(p_{jk}=p_{kj}=0\), then the above holds naturally. If \(p_{jk}\neq 0\), then by assumption, we could define

\[ \pi_k:=\frac{1}{Z}\frac{p_{j_0j_1}\cdots p_{j_{n} j}p_{jk}}{p_{kj}p_{j_n j}\cdots p_{j_1j_0}} \]

and it also holds.

\(\square\)