## Tuesday, April 17, 2012

### Markov Chain

Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.

Introduction

Formally, a Markov chain is a random process with the Markov property. Often, the term "Markov chain" is used to mean a Markov process which has a discrete (finite or countable) state-space. Usually a Markov chain is defined for a discrete set of times (i.e., a discrete-time Markov chain) although some authors use the same terminology where "time" can take continuous values.The use of the term in Markov chain Monte Carlo methodology covers cases where the process is in discrete time (discrete algorithm steps) with a continuous state space. The following concentrates on the discrete-time discrete-state-space case.
A discrete-time random process involves a system which is in a certain state at each step, with the state changing randomly between steps. The steps are often thought of as moments in time, but they can equally well refer to physical distance or any other discrete measurement; formally, the steps are the integers or natural numbers, and the random process is a mapping of these to states. The Markov property states that the conditional probability distribution for the system at the next step (and in fact at all future steps) depends only on the current state of the system, and not additionally on the state of the system at previous steps.
Since the system changes randomly, it is generally impossible to predict with certainty the state of a Markov chain at a given point in the future. However, the statistical properties of the system's future can be predicted. In many applications, it is these statistical properties that are important.
The changes of state of the system are called transitions, and the probabilities associated with various state-changes are called transition probabilities. The set of all states and transition probabilities completely characterizes a Markov chain. By convention, we assume all possible states and transitions have been included in the definition of the processes, so there is always a next state and the process goes on forever.
A famous Markov chain is the so-called "drunkard's walk", a random walk on the number line where, at each step, the position may change by +1 or −1 with equal probability. From any position there are two possible transitions, to the next or previous integer. The transition probabilities depend only on the current position, not on the manner in which the position was reached. For example, the transition probabilities from 5 to 4 and 5 to 6 are both 0.5, and all other transition probabilities from 5 are 0. These probabilities are independent of whether the system was previously in 4 or 6.
Another example is the dietary habits of a creature who eats only grapes, cheese or lettuce, and whose dietary habits conform to the following rules:
• It eats exactly once a day.
• If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability.
• If it ate grapes today, tomorrow it will eat grapes with probability 1/10, cheese with probability 4/10 and lettuce with probability 5/10.
• If it ate lettuce today, it will not eat lettuce again tomorrow but will eat grapes with probability 4/10 or cheese with probability 6/10.
This creature's eating habits can be modeled with a Markov chain since its choice tomorrow depends solely on what it ate today, not what it ate yesterday or even farther in the past. One statistical property that could be calculated is the expected percentage, over a long period, of the days on which the creature will eat grapes.
A series of independent events (for example, a series of coin flips) satisfies the formal definition of a Markov chain. However, the theory is usually applied only when the probability distribution of the next step depends non-trivially on the current state.
Many other examples of Markov chains exist.

Formal Definition
A Markov chain is a sequence of random variables X1X2X3, ... with the Markov property, namely that, given the present state, the future and past states are independent. Formally, $\Pr(X_{n+1}=x|X_1=x_1, X_2=x_2, \ldots, X_n=x_n) = \Pr(X_{n+1}=x|X_n=x_n).\,$
The possible values of Xi form a countable set S called the state space of the chain.
Markov chains are often described by a directed graph, where the edges are labeled by the probabilities of going from one state to the other states.

### Variations

• Continuous-time Markov processes have a continuous index.
• Time-homogeneous Markov chains (or stationary Markov chains) are processes where $\Pr(X_{n+1}=x|X_n=y) = \Pr(X_n=x|X_{n-1}=y)\,$
for all n. The probability of the transition is independent of n.
• Markov chain of order m (or a Markov chain with memory m), where m is finite, is a process satisfying \begin{align} {} &\Pr(X_n=x_n|X_{n-1}=x_{n-1}, X_{n-2}=x_{n-2}, \dots , X_1=x_1) \\ = &\Pr(X_n=x_n|X_{n-1}=x_{n-1}, X_{n-2}=x_{n-2}, \dots, X_{n-m}=x_{n-m}) \text{ for }n > m \end{align}
In other words, the future state depends on the past m states. It is possible to construct a chain (Yn) from (Xn) which has the 'classical' Markov property as follows:
It can be proved that a Markov chain of order m can be in fact reduced to a Markov chain of order m = 1 (a simple Markov chain). Indeed, let Yn = (XnXn−1, ..., Xnm+1), the ordered m-tuple of X values. Then Yn is a Markov chain with state space Sm and has the classical Markov property.
• An additive Markov chain of order m is determined by an additive conditional probability, $\Pr(X_n=x_n|X_{n-1}=x_{n-1}, X_{n-2}=x_{n-2}, \dots, X_{n-m}=x_{n-m}) = \sum_{r=1}^{m} f(x_n,x_{n-r},r) .$
The value f(xn,xn-r,r) is the additive contribution of the variable xn-r to the conditional probability.

### Example

A simple example is shown in the figure on the right, using a directed graph to picture the state transitions. The states represent whether the economy is in a bull market, a bear market, or a recession, during a given week. According to the figure, a bull week is followed by another bull week 90% of the time, a bear market 7.5% of the time, and a recession the other 2.5%. From this figure it is possible to calculate, for example, the long-term fraction of time during which the economy is in a recession, or on average how long it will take to go from a recession to a bull market. Using the transition probabilities, the steady-state probabilities indicate that 62.5% of weeks will be in a bull market, 31.25% of weeks will be in a bear market and 6.25% of weeks will be in a recession.
A thorough development and many examples can be found in the on-line monograph Meyn & Tweedie 2005. The appendix of Meyn 2007, also available on-line, contains an abridged Meyn & Tweedie.
A finite state machine can be used as a representation of a Markov chain. Assuming a sequence ofindependent and identically distributed input signals (for example, symbols from a binary alphabet chosen by coin tosses), if the machine is in state y at time n, then the probability that it moves to state x at time n + 1 depends only on the current state.