Markov Process & chain¶
Definition of Markov Process
Assume \(\{X(t): t\in T\}\) is a stochastic process. If \(\forall n\), \(t_i\in T\), \(i=1,2,\cdots,n\), and \(t_1<\cdots<t_n\), we have state \(x_1,\cdots,x_n\in S\), the conditional probability \(X(t_n)\) satisfies
then \(X(t)\) is called Markov Process.
When the state space \(S\) is discrete, then we call the above process Markov Chain.
When both the state space \(S\) and time space \(T\) are discrete, then the above process is called discrete Markov Chain.
Transition Probability Distribution
We call the following conditional distribution
trransition probability distribution.
For discrete Markov chain, \(\forall n,m \in T\), \(i,j \in S\), we define transition probability
which means the probability of transiting from state \(i\) at time \(m\) to state \(j\) at time \(n\). Apparently, we have \(p_{ij}(m,n)\geq 0\) and
Define transition matrix of \(n\) steps
Specifically, we focus on transition matrix of one step. If \(P(m,m+1)\) is irrelevant with \(m\), then we call the above Markov chain time-homogeneous. So we denote
C-K Equation¶
Chapman-Kolmogorov Equation
Assume \(\{X_n\}\) is a discrete Markov Chain, and its state space is \(S\). Then \(\forall h,l\in \mathbb{N}^+\), and \(\forall m\in \mathbb{N}\),
which is called Chapman-Kolmogorov Equation.
By definition,
Corollary: Multiple transition Probability
Transition probablity of one step of a discrete Markov chain could determine the transition probability of multiple steps.
By induction. For \(h=2\), we have
Assume for \(h=k\), the theorem holds. Then for \(h=k+1\), we have
If we write a matrix, then
Note here is the premultiplication of matrices.
Determination of the distribution of discrete Markov Chain
Assume \(\{X_n\}\) is a discrete Markov Chain, and its state space is \(S\). The finite distribution of \(X_n\) is determined if all the transition probability of one step \(p_{ij}(m,m+1)\) and the initial distribution \(p_i=P(X_0=i)\) is determined.
\(\forall k\in \mathbb{N}^+\), \(\forall 0<m_1<\cdots<x_k\in T\), and \(i_1,\cdots,i_k\in S\), we have
State Space¶
Now we focus on time-homogeneous discrete Markov Chain.
Invariance of Time
Assume \(X_n\) is a time-homogeneous discrete Markov Chain, then its transition probability of \(h\) steps
is irrelevant with time \(m\).
Use C-K equation
which is only dependent on steps \(h\). This is similar to stationary increment property.
Accessibility & Communicating Class¶
Accessibility of state
If there exsits \(N>0\) such that \(p_{ij}(N)>0\), then we call that state \(i\) could access state \(j\), denoted by
If \(i\rightarrow j\) and \(j\rightarrow i\), then we call \(i\), \(j\) communicate, denoted by
Properties of accessibility
(i) If \(i\rightarrow j\) and \(j\rightarrow k\), then \(i\rightarrow k\).
(ii) If \(i\leftrightarrow j\), then \(j\leftrightarrow i\).
(iii) If \(i\leftrightarrow j\) and \(j\leftrightarrow i\), then \(i\leftrightarrow i\).
(i) There exsits \(N_1,N_2>0\), s.t. \(p_{ij}(N_1)>0\) and \(p_{jk}(N_2)>0\), then by C-K equation
Definition of Communicating class
We denote \(C(j)\) to be the communicating class of state \(j\), which is actually the collection of all states mutually accessible from \(j\).
If \(C(j)=\varnothing\), then we call \(j\) a return state.
If \(C(j)\neq \varnothing\), then we call \(j\) a non-return state.
In further study, we would see a non-return state must be a transient state, but a return state should be decomposed carefully.
Property of communicating class
If \(i\leftrightarrow j\), then \(C(i)=C(j)\).
Easy to see. \(\forall k\in C(i)\), we have \(k\leftrightarrow i\). Since \(i\leftrightarrow j\), we have \(k\leftrightarrow j\), so \(k\in C(j)\). The other direction is the same.
Closed state class
We call set of state \(C\) a closed state set, if for all state \(j\not\in C\), and \(i\in C\), \(j\not\leftrightarrow i\).
If a closed state set has only one element \(j\), then we call \(j\) an absorbing state.
If any two states of a state space \(S\) communicate, then we call the Markov chain irreducible.
Theorem for communicating class
If two state set \(C_1\), \(C_2\) are communicating classes, then either \(C_1=C_2\) or \(C_1\cap C_2=\varnothing\).
By definition of communicating class, there exists \(i\), \(j\) such that
If \(C_1\cap C_2\neq \varnothing\), then choose one \(g\in C_1\cap C_2\), which satisfies \(g\leftrightarrow i\) and \(g\leftrightarrow j\). So
which means \(C_1=C(i)=C(j)=C_2\) by Property of communicating class.
decomposition of Markov Chain
The state space \(S\) of Markov chain could be decomposed into denumerable mutually disjoint state sets, i.e.
where \(C_i\cap C_j=\varnothing\) whenever \(i\neq j\).
Iterate over all elements of \(S\). Arbitrarily choose a state \(j_1\in S\), let \(C_1=C(j_1)\), if \(j_1\) is a recurrent state; otherwise let \(C_1=\{j_1\}\), i.e. \(j_1\) is a transient state. Next round, Arbitrarily choose a state \(j_2\in S-C_1\), let \(C_2=C(j_2)\), if \(j_2\) is a recurrent state; otherwise let \(C_2=\{j_2\}\), i.e. \(j_2\) is a transient state.
Continue until \(S-C_1-C_2-\cdots=\varnothing\).
Period of State¶
Now we focus on recurrent state, other than absorbing state or transient state, since both of which does not have periodical phenomenon.
Definition of period of state
We denote \(d(j)\) to be the period of a recurrent state \(j\), defined by
if \(d(j)=1\), we call \(j\) non-periodical state, which includes the aborbing state \(k\), whose \(d(k)=1\).
Usually, if state \(i\) has \(d(i)=d\), then for all \(n\neq 0(\text{mod } d)\), we have \(p_{ii}(n)=0\). Conversely speaking, if \(n\equiv 0(\text{mod } d)\), \(p_{ii}(n)>0\) do not hold necessarily.
To be elaborate, we have the following theorem.
Theorem for period
If state \(i\) has a period \(d\), then there exsits \(M\in\mathbb{N}^+\), such that for all \(n>M\),
Transitivity of period
If \(i\leftrightarrow j\), then \(d(i)=d(j)\).
Use a special loop. Since \(i\leftrightarrow j\), so there exsits \(n,m>0\) such that \(p_{ij}(n)>0\) and \(p_{ji}(m)>0\), so
So \(d(i)\mid n+m\).
By definition of \(d(j)=gcd\{n_1,n_2,\cdots,n_k,\cdots\}\), we have \(p_{jj}(n_k)>0\) for all \(k=1,\cdots\). So for each \(k\),
So \(d(i)\mid n+m+n_k\). Combined with the above two, we have \(d(i)\mid n_k\), which means \(d(i)\mid d(j)\).
Similarly we have \(d(j)\mid d(i)\).
Ergodicity of State¶
Here we dig deeper into the probability itself for recurrent states. Notice some recurrent state is possible not to return, while others is always possible to return.
Definition of passage probability of State
Define the first-passage probability from state \(i\) to \(j\) after \(n\) steps.
Define Ultimate probability from state \(i\) to \(j\)
Note that
which is the core of our analysis.
We define
to be First Hitting Time of \(j\).
Then
and
Notice that the above "\(<\infty\)" means the chain could reach state \(j\) from \(i\) in finitely many steps.
Theorem for relationship between first-passage probability and passage probability
For any state \(i,j\in S\), \(\forall n\geq 1\),
Definition of Recurrent State, self-reachable
If \(f_{ii}=1\), or \(P(T_i<\infty\mid X_0=i)=1\), then state \(i\) is called recurrent state.
If \(f_{ii}<1\), or \(P(T_i<\infty\mid X_0=i)<1\), then state \(i\) is called transient state.
For recurrent state \(i\), we have \(\sum_{n=1}^\infty f_{ii}(n)=1\). So \(f_{ii}(n)\) becomes a discrete distribution.
Now we could talk about the mathematical characteristics of the above distribution and its practical meaning. That is, let \(T_i\) to be the steps (time) from state \(i\) to \(i\) at the first passage, which is actually first hitting time in Another version of first-passage probability. Then
Definition of Mean Recurrence Time
Mean recurrence time is defined as ME of \(T_i\)
For recurrent state \(i\), if \(\mu_i<+\infty\), then \(i\) is called Positive recurrent state. If \(\mu_i=+\infty\), then \(i\) is called Null recurrent state.
A non-periodic positive recurrent state is also called Ergodic state.
Define \(r_0=1\), \(r_k=P(T_i>k\mid X_0=i)\), \(k\geq 1\), then
This could be explained by this. We use the definition of expectation. Since
So
Actually, it is really hard to calculate \(f_{ii}(n)\) in some complex Markov chain. So we have to give another description of recurrent state.
n-visit Probability¶
n-visit Probability
Define n-visit probability
which means the probability for the chain to pass \(i\) in at least \(n\) times with initial state \(i\). Then we have
and define \(Q_{ij}:=\lim\limits_{n\rightarrow \infty}Q_{ij}(n)\).
Properties of transfinite probability
(i) Monotonically decreasing.
(ii) If \(Q_{ij}=1\), which means starting from state \(i\), we would reach state \(j\) infinitely many times. So by property (i), we have \(Q_{ij}(n)=1.\)
(iii) If \(Q_{ii}=1\), then \(Q_{ii}(1)=f_{ii}=1\), state \(i\) is a recurrent state.
Relationship between \(f_{ij}\) and \(Q_{ij}\)
For any state \(i,j\in S\), and parameter \(n\geq 1\), we have
(i) \(Q_{ij}(m+1)=f_{ij} Q_{jj}(m)\);
(ii) \(Q_{ij}=f_{ij}Q_{jj}\);
(iii) \(Q_{jj}=\lim\limits_{n\rightarrow \infty}(f_{jj})^n\).
(i) Still use first-passage to partition the space.
(ii) By (i) and take a limit.
(iii) Apply (i) to \(Q_{jj}(m)\) iteratively.
Recurrent state description with passage probability
(i) State \(j\) is a recurrent state, iff \(\sum\limits_{n=1}^\infty p_{jj}(n)=+\infty\).
(ii) State \(j\) is a transient state, iff \(\sum\limits_{n=1}^\infty p_{jj}(n)<+\infty\).
The proof is done by leveraging Theorem for relationship between first-passage probability and passage probability.
- Sufficient for (i) and necessary for (ii).
We have
so we have
let \(N\rightarrow\infty\), we have
which means \(f_{jj}=1\), so it is recurrent state. So if state \(j\) is transient, then it must have \(\sum\limits_{n=1}^\infty p_{jj}(n)<+\infty\).
- Necessary for (i) and sufficient for (ii).
Still using the same formula, we choose a smaller one \(M\leq N\)
So
let \(N\rightarrow \infty\), if \(\sum_{n=1}^{N} p_{jj}(n)<\infty\), then the right-hand limit is strictly less than \(1\), i.e.
so let \(M\rightarrow \infty\), we have a contradiction. Actually, if \(\sum_{n=1}^{N} p_{jj}(n)<\infty\), then \(f_{jj}<1\), which means state \(j\) is transient state.
We could take another perspective. Notice
So the above summation in the ME is actually the total number of events \(A_1,\cdots,A_i,\cdots\) happening. In our Markov chain, we have
to denote the number of time the chain reaches state \(j\) from initial state \(j\) (for steps iterating form \(1\) to \(\infty\)). \(A_i\) denotes the event that the chain reach state \(j\) from state \(j\) in \(n\) steps, \(Y_i\) denotes whether the chain reaches \(j\) in exactly \(i\) steps.
Note that \(A_i\) are not necessarily mutually disjoint.
So
If we write explicitely the initial state, then
Corollary: Transitivity of Recurrent state
If state \(i\leftrightarrow j\), and \(j\) is a recurrent state, then \(i\) is also a recurrent state.
Since \(i\leftrightarrow j\), \(\exists m,n\geq 1\), such that
we are done.
\(\square\)
Limit Theorem¶
We now focus on the outer characteristics of recurrent, transient and ergodic state, the limit of \(p_{jj}(n)\).
The following theorem has considered the influence of period. If a positive recurrent state has a period, it might not have the limit of \(p_{jj}(n)\), but just a convergent subsequence. So a more general statement of the following (i) and (ii) is
for positive recurrent state, without considering period.
Theorem for limit of passage probability of three states
(i) If state \(j\) is a non-periodic positive recurrent state (ergodic), then
(ii) If state \(j\) is a positive recurrent state with period \(d\), then
(iii) If state \(j\) is null recurrent state or transient state, then
We first define some variables.
Define \(N_j(n)\) to be the number of reaching \(j\) in \(\{X_1,\cdots,X_n\}\), also called Sojourn Time.
Define \(T_1\) to be the number of steps for reaching \(j\) at the first time. \(T_k\) to be the number of steps for reaching \(j\) at \(k\) times after reaching \(j\) at \(k-1\) times. So by time-homogeneous property, we have \(T_1,\cdots,T_k,\cdots\) are mutually independent and follow the same distribution, i.e. first hitting time \(T_j\), that is,
and satisfies \(E(T_k)=\mu_j\).
Define accumulated time \(S_k=\sum_{i=1}^k T_i\). Then we have the relationship
or
Note that the above relationship also holds in Possion Process.
- Prove first \(\lim\limits_{n\rightarrow \infty} \frac{N_j(n)}{n}=\frac{1}{\mu_j}\)
Notice their relationship
so
By Strong law of large number, we have \(\frac{S_n}{n}\rightarrow \mu_j, a.s.\), and since \(N_j(n)\rightarrow \infty\) as \(n\rightarrow \infty\) (recurrent state),
and we have
- Prove second \(\lim\limits_{n\rightarrow \infty} \frac{1}{n}\sum_{m=1}^n p_{jj}(m)=\frac{1}{\mu_j}\).
By definition of sojourn time, we have
Actually \(Y_m\) denotes whether chain reaches \(j\) in exactly \(m\) steps, if so, \(Y_m=1\), otherwise \(Y_m=0\). This is actually the same with another perspective for the description of recurrent state, i.e.
So
take the limit and we have the result.
- Prove the proposition.
Easy to check, just assume \(\lim_{n\rightarrow \infty}p_{jj}(n)=\frac{1}{\mu_j}+\delta\), substitute in the above result and we have \(\delta=0\).
\(\square\)
Let \(n=dn'\), we could make \(X_{dn'}=X_{n}\) a non-periodical Markov chain. So by (i)
Notice that
and we are done.
\(\square\)
- For null recurrent state, by (ii), let \(\mu_j\rightarrow \infty\), and we have \(\lim\limits_{n\rightarrow \infty}p_{jj}(nd)=0\). Since \(p_{jj}(n)=0\) for \(n\neq\equiv 0(\text{ mod } d)\), so by Haine Theorem,
- For transient state, just by convergence of \(\sum_{n=1}^\infty p_{jj}(n)\), we have each item \(\lim_{n\rightarrow \infty }p_{jj}(n)=0\).
\(\square\)
Corollary: Transitivity of ergodic state
(i) If state \(i\leftrightarrow j\), and \(j\) is a positive recurrent state, then \(i\) is also a positive recurrent state.
(ii) If state \(i\leftrightarrow j\), and \(j\) is a zero recurrent state, then \(i\) is also a zero recurrent state.
By Transitivity of Recurrent state, we have \(i\leftrightarrow j\). So \(\exists m,n>0\), such that \(p_{ij}(m)>0\), and \(P_{ji}(n)>0\), so
let \(k\rightarrow \infty\), we have
If \(j\) is a positive recurrent state, then we have \(\lim\limits_{n\rightarrow\infty} p_{ii}(n)>0\) and we are done.
\(\square\)
Now the following theorem gives a limit situation for different initial states.
Theorem for transient state from different initial state
(i) If \(j\) is a non-periodic positive recurrent state, then \(\forall\) state \(i\),
(ii) If \(j\) is a null recurrent state or transient state, then \(\forall\) state \(i\),
We have a lot of proof method, including using Lebesgue's Dominated convergence theorem. Here we give a squeeze method.
By Theorem for relationship between first-passage probability and passage probability, we have
choose \(M<n\), we have
let \(n\rightarrow \infty\), and let \(M\rightarrow \infty\), we have
For the other direction, we just leave out \(n-M\) number of summation items, and let \(M, n\rightarrow \infty\), we have
So we are done.
\(\square\)
- For null recurrent state.
and we are done.
- For transient state. By Recurrent state description with passage probability, since \(j\) is a transient state, we have
So by Theorem for relationship between first-passage probability and passage probability,
so \(\lim\limits_{n\rightarrow\infty}p_{ij}(n)=0\).
Limit distribution¶
Actually, we hope to find no matter whether the initial state is, after enough steps, after enough number of sampling, the state has a convergent distribution. That is, whether
exists?
We have the following necessary condition.
Existence for limit distribution
If \(\forall i\in S\), \(\exists\) a distribution \(\nu_{i}=\{\nu_{ij}\}_{j\in S}\) such that \(\lim\limits_{n\rightarrow \infty}p_{ij}(n)=\nu_{ij}\), then limit distribution exists.
By total probability formula
So let \(n\rightarrow \infty\), and by LDCT we have
So define \(\mu_j=\lim\limits_{n\rightarrow \infty} p_j(n)=\sum\limits_{i=1}^\infty P(X_0=i)\nu_{ij}\), which is a distribution, since
So we have to know whether transition probability converges as steps go to infinity. In actual practice, this means calculation of \(\lim\limits_{n\rightarrow \infty}\pmb{P}^{(n)}=\lim\limits_{n\rightarrow \infty}\pmb{P}^n\).
Limit distribution is connected close to the following stationary distribution.
Definition of Stationary distribution
Assume there exists a distribution \(\{\pi_k\}_{k\geq 0}\) such that for all state \(k\in S\), we have
where \(\pi=[\pi_0,\pi_1,\cdots]\) is a row vector.
The above definition implies
which also have a practical meaning
which means the probability of leaving \(k\) and entering \(k\) in one step are the same.
A Markov chain with an initial Stationary distribution is Strictly stationary process
A Markov chain with an initial Stationary distribution is strictly stationary process. That is, if
where \(p_k(0)=P(X_0=k)\), \(\{\pi_k\}\) is a stationary distribution of the Markov chain, then for all \(n\geq 1\), we have
Since \(\pi_k=\sum_{j=0}^\infty \pi_j p_{jk}\), then
By induction, or Markov characteristic, we are done.
\(\square\)
Now we could talk about the relationship between limit distribution and stationary distribution.
Lemma: First-passage probability with transitivity
If state \(i\rightarrow j\), \(i\) is a recurrent state, then \(f_{ij}=1\) and \(j\rightarrow i\).
\(j\rightarrow i\) is apparent because after reaching \(j\), \(\exists n'>0\) such that \(p_{ji}(n')>0\). It is a little hard to prove \(f_{ij}=1\), i.e. we could reach \(j\) in finite many times.
A lot of proof could be used here. We give two versions. The first, in my opinion, is the simplest.
Assume \(X_0=i\), since \(i\rightarrow j\), \(\exists n>0\), such that \(p_{ij}(n)>0\). Now we make use of the properties of recurrent state \(i\). We check the number of failure for reaching \(j\).
If \(X_n\neq j\), then \(\exists T_1=\min\{k\geq n: X_k=i\}\) (we fail to reach \(j\), but afterwards we could still reach \(i\)). Check if \(X_{T_1+n}\neq j\), then \(\exists T_2=\min\{k\geq n+T_1: X_k=i\}\). Follow this routine, we define
and \(T_0=0\) as default. Define \(\zeta=\#\{l\geq 0: X_{T_l+n}\neq j\}\), which is the number of failing reaching state \(j\) before success after \(T_l+n\) steps (if successfully reaching \(j\), then we stop the routine and return \(l\)). Then easy to see that
(fail \(k\) times and success at the end). This random variable is reasonable, because \(i\) is recurrent. So \(\zeta\) follows geometric distribution and
which means in average, we fail in finite many times, and \(\zeta<\infty\) a.s. or \((P(\zeta=\infty)=0)\). So \(\exists T<\infty\) such that \(X_{T}=j\), which means \(f_{ij}=1\).
\(\square\)
Theorem: Relationship between stationary and limit distribution
Assume \(\{X_n\}\) is an irreducible Markov chain.
(i) it has a unique stationsary distribution, iff all of its state are positive recurrent, and
(ii) if \(X_n\) is ergodic, then its stationary distribution equals the limit distribution, i.e. for all state \(i,j\),
which means it is also irrelevant with initial state.
- Necessary.
Assume it has a unique stationary distribution, then
Let \(n\rightarrow \infty\), exchange the order of limit and summation, we have
for null recurrent state and transient state, which contradicts! Since it is irreducible, then all the state is positive recurrent.
- Sufficient. We focus on the limit of \(p_{ij}(n)\) in formulating stationary distribution.
Since \(i\leftrightarrow j\), by Lemma: First-passage probability with transitivity \(f_{ij}=1\). Since all the states are positive recurrent, by Theorem for transient state from different initial state
We want to prove that \(\{\frac{1}{\mu_j}, j\geq 0\}\) is the stationary distribution.
Firstly, since \(\sum\limits_{j=0}^\infty p_{ij}(n)=1\), we want to take limit of \(n\) to get \(\frac{1}{\mu_j}\), i.e. by Fatou Lemma,
Secondly, since
still by Fatou lemma, we have
Actually, for a stationary distribution, we shoule prove \(\frac{1}{\mu_j}=\sum_{k=0}^\infty \frac{1}{\mu_k} p_{ik}\). We show this by contradiction. If \(\exists j_0\) such that
then
which contradicts!
Lastly, we prove that \(\{\frac{1}{\mu_j},j\geq 0\}\) is a distribution, i.e. \(\sum_{j=0}^\infty \frac{1}{\mu_j}=1\). With the same result, we have \(\frac{1}{\mu_j}=\sum_{k=0}^\infty \frac{1}{\mu_k} p_{kj} (m)\) (iteratively, proved by readers). So
Theorem for mean recurrence time
Assume \(X_n\) is a non-periodic irreducible Markov chain, then
where \(\tau_j=ET_j\) is the mean recurrence time of state \(j\).
Inverse Markov Chain¶
Check whether the following Markov property holds?
\(\square\)
Actually, we need some special condition to interpret.
The first is to making use of stationary distribution. That is, if the initial distribution is \(\pi\), then \(p_{j}(n)=\pi_j\), \(p_{i}(n+1)=\pi_i\), and
which is irrelevant with time.
Now we have a new but natural idea of a special term: Inverse distribution.
Definition of inverse distribution
If
i.e.
then the whole Markov chain is inversible.
Actually, having a stationary distribution is a necessary condition for a Markov chain to be inverse.
Kolmogorov Criterion
Assume \(X_n\) is a irreducible stationary Markov chain, then it is inversible iff for all closed path
we have
- necessary.
If \(X\) is inversible, then \(\forall i,j\in S\), \(\pi_ip_{ij}=\pi_jp_{ji}\). So
- Sufficient. This is similar to a curve integral.
Fix a state \(j_0\), by assumption, for any state \(j\), there must be a path \(j, j_n, j_{n-1},\cdots, j_{1},j_0\) such that
and then we could define
where \(Z\) is made for \(\sum_{j}\pi_j=1\). Then it is easy to show that the above definition is well-defined, and for stationary distribution, we have to show for \(j,k \in S\),
If \(p_{jk}=p_{kj}=0\), then the above holds naturally. If \(p_{jk}\neq 0\), then by assumption, we could define
and it also holds.
\(\square\)