Top View Concept in probability theory
In probability theory , a Markov kernel (also known as a stochastic kernel or probability kernel ) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space .
Let ( X , A ) {\displaystyle (X,{\mathcal {A}})} and ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} be measurable spaces . A Markov kernel with source ( X , A ) {\displaystyle (X,{\mathcal {A}})} and target ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} , sometimes written as κ : ( X , A ) → ( Y , B ) {\displaystyle \kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})} , is a function κ : B × X → [ 0 , 1 ] {\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]} with the following properties:
For every (fixed) B 0 ∈ B {\displaystyle B_{0}\in {\mathcal {B}}} , the map x ↦ κ ( B 0 , x ) {\displaystyle x\mapsto \kappa (B_{0},x)} is A {\displaystyle {\mathcal {A}}} -measurable For every (fixed) x 0 ∈ X {\displaystyle x_{0}\in X} , the map B ↦ κ ( B , x 0 ) {\displaystyle B\mapsto \kappa (B,x_{0})} is a probability measure on ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} In other words it associates to each point x ∈ X {\displaystyle x\in X} a probability measure κ ( d y | x ) : B ↦ κ ( B , x ) {\displaystyle \kappa (dy|x):B\mapsto \kappa (B,x)} on ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} such that, for every measurable set B ∈ B {\displaystyle B\in {\mathcal {B}}} , the map x ↦ κ ( B , x ) {\displaystyle x\mapsto \kappa (B,x)} is measurable with respect to the σ {\displaystyle \sigma } -algebra A {\displaystyle {\mathcal {A}}} .
Take X = Y = Z {\displaystyle X=Y=\mathbb {Z} } , and A = B = P ( Z ) {\displaystyle {\mathcal {A}}={\mathcal {B}}={\mathcal {P}}(\mathbb {Z} )} (the power set of Z {\displaystyle \mathbb {Z} } ). Then a Markov kernel is fully determined by the probability it assigns to singletons { m } , m ∈ Y = Z {\displaystyle \{m\},\,m\in Y=\mathbb {Z} } for each n ∈ X = Z {\displaystyle n\in X=\mathbb {Z} } :
κ ( B | n ) = ∑ m ∈ B κ ( { m } | n ) , ∀ n ∈ Z , ∀ B ∈ B {\displaystyle \kappa (B|n)=\sum _{m\in B}\kappa (\{m\}|n),\qquad \forall n\in \mathbb {Z} ,\,\forall B\in {\mathcal {B}}} .Now the random walk κ {\displaystyle \kappa } that goes to the right with probability p {\displaystyle p} and to the left with probability 1 − p {\displaystyle 1-p} is defined by
κ ( { m } | n ) = p δ m , n + 1 + ( 1 − p ) δ m , n − 1 , ∀ n , m ∈ Z {\displaystyle \kappa (\{m\}|n)=p\delta _{m,n+1}+(1-p)\delta _{m,n-1},\quad \forall n,m\in \mathbb {Z} } where δ {\displaystyle \delta } is the Kronecker delta . The transition probabilities P ( m | n ) = κ ( { m } | n ) {\displaystyle P(m|n)=\kappa (\{m\}|n)} for the random walk are equivalent to the Markov kernel.
More generally take X {\displaystyle X} and Y {\displaystyle Y} both countable and A = P ( X ) , B = P ( Y ) {\displaystyle {\mathcal {A}}={\mathcal {P}}(X),\ {\mathcal {B}}={\mathcal {P}}(Y)} . Again a Markov kernel is defined by the probability it assigns to singleton sets for each i ∈ X {\displaystyle i\in X}
κ ( B | i ) = ∑ j ∈ B κ ( { j } | i ) , ∀ i ∈ X , ∀ B ∈ B {\displaystyle \kappa (B|i)=\sum _{j\in B}\kappa (\{j\}|i),\qquad \forall i\in X,\,\forall B\in {\mathcal {B}}} ,We define a Markov process by defining a transition probability P ( j | i ) = K j i {\displaystyle P(j|i)=K_{ji}} where the numbers K j i {\displaystyle K_{ji}} define a (countable) stochastic matrix ( K j i ) {\displaystyle (K_{ji})} i.e.
K j i ≥ 0 , ∀ ( j , i ) ∈ Y × X , ∑ j ∈ Y K j i = 1 , ∀ i ∈ X . {\displaystyle {\begin{aligned}K_{ji}&\geq 0,\qquad &\forall (j,i)\in Y\times X,\\\sum _{j\in Y}K_{ji}&=1,\qquad &\forall i\in X.\\\end{aligned}}} We then define
κ ( { j } | i ) = K j i = P ( j | i ) , ∀ i ∈ X , ∀ B ∈ B {\displaystyle \kappa (\{j\}|i)=K_{ji}=P(j|i),\qquad \forall i\in X,\quad \forall B\in {\mathcal {B}}} .Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.
Markov kernel defined by a kernel function and a measure [ edit ] Let ν {\displaystyle \nu } be a measure on ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} , and k : Y × X → [ 0 , ∞ ] {\displaystyle k:Y\times X\to [0,\infty ]} a measurable function with respect to the product σ {\displaystyle \sigma } -algebra A ⊗ B {\displaystyle {\mathcal {A}}\otimes {\mathcal {B}}} such that
∫ Y k ( y , x ) ν ( d y ) = 1 , ∀ x ∈ X {\displaystyle \int _{Y}k(y,x)\nu (\mathrm {d} y)=1,\qquad \forall x\in X} ,then κ ( d y | x ) = k ( y , x ) ν ( d y ) {\displaystyle \kappa (dy|x)=k(y,x)\nu (dy)} i.e. the mapping
{ κ : B × X → [ 0 , 1 ] κ ( B | x ) = ∫ B k ( y , x ) ν ( d y ) {\displaystyle {\begin{cases}\kappa :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(y,x)\nu (\mathrm {d} y)\end{cases}}} defines a Markov kernel. This example generalises the countable Markov process example where ν {\displaystyle \nu } was the counting measure . Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on X = Y = R {\displaystyle X=Y=\mathbb {R} } with ν ( d x ) = d x {\displaystyle \nu (dx)=dx} standard Lebesgue measure and
k t ( y , x ) = 1 2 π t e − ( y − x ) 2 / ( 2 t 2 ) {\displaystyle k_{t}(y,x)={\frac {1}{{\sqrt {2\pi }}t}}e^{-(y-x)^{2}/(2t^{2})}} .Measurable functions [ edit ] Take ( X , A ) {\displaystyle (X,{\mathcal {A}})} and ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} arbitrary measurable spaces, and let f : X → Y {\displaystyle f:X\to Y} be a measurable function. Now define κ ( d y | x ) = δ f ( x ) ( d y ) {\displaystyle \kappa (dy|x)=\delta _{f(x)}(dy)} i.e.
κ ( B | x ) = 1 B ( f ( x ) ) = 1 f − 1 ( B ) ( x ) = { 1 if f ( x ) ∈ B 0 otherwise {\displaystyle \kappa (B|x)=\mathbf {1} _{B}(f(x))=\mathbf {1} _{f^{-1}(B)}(x)={\begin{cases}1&{\text{if }}f(x)\in B\\0&{\text{otherwise}}\end{cases}}} for all B ∈ B {\displaystyle B\in {\mathcal {B}}} .Note that the indicator function 1 f − 1 ( B ) {\displaystyle \mathbf {1} _{f^{-1}(B)}} is A {\displaystyle {\mathcal {A}}} -measurable for all B ∈ B {\displaystyle B\in {\mathcal {B}}} iff f {\displaystyle f} is measurable.
This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.
As a less obvious example, take X = N , A = P ( N ) {\displaystyle X=\mathbb {N} ,{\mathcal {A}}={\mathcal {P}}(\mathbb {N} )} , and ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} the real numbers R {\displaystyle \mathbb {R} } with the standard sigma algebra of Borel sets . Then
κ ( B | n ) = { 1 B ( 0 ) n = 0 Pr ( ξ 1 + ⋯ + ξ x ∈ B ) n ≠ 0 {\displaystyle \kappa (B|n)={\begin{cases}\mathbf {1} _{B}(0)&n=0\\\Pr(\xi _{1}+\cdots +\xi _{x}\in B)&n\neq 0\\\end{cases}}} where x {\displaystyle x} is the number of element at the state n {\displaystyle n} , ξ i {\displaystyle \xi _{i}} are i.i.d. random variables (usually with mean 0) and where 1 B {\displaystyle \mathbf {1} _{B}} is the indicator function. For the simple case of coin flips this models the different levels of a Galton board .
Composition of Markov Kernels [ edit ] Given measurable spaces ( X , A ) {\displaystyle (X,{\mathcal {A}})} , ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} we consider a Markov kernel κ : B × X → [ 0 , 1 ] {\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]} as a morphism κ : X → Y {\displaystyle \kappa :X\to Y} . Intuitively, rather than assigning to each x ∈ X {\displaystyle x\in X} a sharply defined point y ∈ Y {\displaystyle y\in Y} the kernel assigns a "fuzzy" point in Y {\displaystyle Y} which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space ( Z , C ) {\displaystyle (Z,{\mathcal {C}})} , and probability kernels κ : X → Y {\displaystyle \kappa :X\to Y} and λ : Y → Z {\displaystyle \lambda :Y\to Z} , we can define a composition λ ∘ κ : X → Z {\displaystyle \lambda \circ \kappa :X\to Z} by the Chapman-Kolmogorov equation
( λ ∘ κ ) ( d z | x ) = ∫ Y λ ( d z | y ) κ ( d y | x ) {\displaystyle (\lambda \circ \kappa )(dz|x)=\int _{Y}\lambda (dz|y)\kappa (dy|x)} .The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure κ 1 ( d x ′ | x ) = δ x ( d x ′ ) {\displaystyle \kappa _{1}(dx'|x)=\delta _{x}(dx')} ) is the unit for this composition.
This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere, the category of Markov kernels .
Probability Space defined by Probability Distribution and a Markov Kernel [ edit ] A composition of a probability space ( X , A , P X ) {\displaystyle (X,{\mathcal {A}},P_{X})} and a probability kernel κ : ( X , A ) → ( Y , B ) {\displaystyle \kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})} defines a probability space ( Y , B , P Y = κ ∘ P X ) {\displaystyle (Y,{\mathcal {B}},P_{Y}=\kappa \circ P_{X})} , where the probability measure is given by
P Y ( B ) = ∫ X ∫ B κ ( d y | x ) P X ( d x ) = ∫ X κ ( B | x ) P X ( d x ) = E P X κ ( B | ⋅ ) . {\displaystyle P_{Y}(B)=\int _{X}\int _{B}\kappa (dy|x)P_{X}(dx)=\int _{X}\kappa (B|x)P_{X}(dx)=\mathbb {E} _{P_{X}}\kappa (B|\cdot ).} Let ( X , A , P ) {\displaystyle (X,{\mathcal {A}},P)} be a probability space and κ {\displaystyle \kappa } a Markov kernel from ( X , A ) {\displaystyle (X,{\mathcal {A}})} to some ( Y , B ) {\displaystyle (Y,{\mathcal {B}})} . Then there exists a unique measure Q {\displaystyle Q} on ( X × Y , A ⊗ B ) {\displaystyle (X\times Y,{\mathcal {A}}\otimes {\mathcal {B}})} , such that:
Q ( A × B ) = ∫ A κ ( B | x ) P ( d x ) , ∀ A ∈ A , ∀ B ∈ B . {\displaystyle Q(A\times B)=\int _{A}\kappa (B|x)\,P(dx),\quad \forall A\in {\mathcal {A}},\quad \forall B\in {\mathcal {B}}.} Regular conditional distribution [ edit ] Let ( S , Y ) {\displaystyle (S,Y)} be a Borel space , X {\displaystyle X} a ( S , Y ) {\displaystyle (S,Y)} -valued random variable on the measure space ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},P)} and G ⊆ F {\displaystyle {\mathcal {G}}\subseteq {\mathcal {F}}} a sub- σ {\displaystyle \sigma } -algebra. Then there exists a Markov kernel κ {\displaystyle \kappa } from ( Ω , G ) {\displaystyle (\Omega ,{\mathcal {G}})} to ( S , Y ) {\displaystyle (S,Y)} , such that κ ( ⋅ , B ) {\displaystyle \kappa (\cdot ,B)} is a version of the conditional expectation E [ 1 { X ∈ B } ∣ G ] {\displaystyle \mathbb {E} [\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}]} for every B ∈ Y {\displaystyle B\in Y} , i.e.
P ( X ∈ B ∣ G ) = E [ 1 { X ∈ B } ∣ G ] = κ ( ⋅ , B ) , P -a.s. ∀ B ∈ G . {\displaystyle P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\cdot ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.} It is called regular conditional distribution of X {\displaystyle X} given G {\displaystyle {\mathcal {G}}} and is not uniquely defined.
Transition kernels generalize Markov kernels in the sense that for all x ∈ X {\displaystyle x\in X} , the map
B ↦ κ ( B | x ) {\displaystyle B\mapsto \kappa (B|x)} can be any type of (non negative) measure, not necessarily a probability measure.
§36. Kernels and semigroups of kernels