Top View
The intensity λ {\displaystyle \lambda } of a counting process is a measure of the rate of change of its predictable part. If a stochastic process { N ( t ) , t ≥ 0 } {\displaystyle \{N(t),t\geq 0\}} is a counting process, then it is a submartingale , and in particular its Doob-Meyer decomposition is
N ( t ) = M ( t ) + Λ ( t ) {\displaystyle N(t)=M(t)+\Lambda (t)} where M ( t ) {\displaystyle M(t)} is a martingale and Λ ( t ) {\displaystyle \Lambda (t)} is a predictable increasing process. Λ ( t ) {\displaystyle \Lambda (t)} is called the cumulative intensity of N ( t ) {\displaystyle N(t)} and it is related to λ {\displaystyle \lambda } by
Λ ( t ) = ∫ 0 t λ ( s ) d s {\displaystyle \Lambda (t)=\int _{0}^{t}\lambda (s)ds} .Given probability space ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )} and a counting process { N ( t ) , t ≥ 0 } {\displaystyle \{N(t),t\geq 0\}} which is adapted to the filtration { F t , t ≥ 0 } {\displaystyle \{{\mathcal {F}}_{t},t\geq 0\}} , the intensity of N {\displaystyle N} is the process { λ ( t ) , t ≥ 0 } {\displaystyle \{\lambda (t),t\geq 0\}} defined by the following limit:
λ ( t ) = lim h ↓ 0 1 h E [ N ( t + h ) − N ( t ) | F t ] {\displaystyle \lambda (t)=\lim _{h\downarrow 0}{\frac {1}{h}}\mathbb {E} [N(t+h)-N(t)|{\mathcal {F}}_{t}]} .The right-continuity property of counting processes allows us to take this limit from the right.
In statistical learning , the variation between λ {\displaystyle \lambda } and its estimator λ ^ {\displaystyle {\hat {\lambda }}} can be bounded with the use of oracle inequalities.
If a counting process N ( t ) {\displaystyle N(t)} is restricted to t ∈ [ 0 , 1 ] {\displaystyle t\in [0,1]} and n {\displaystyle n} i.i.d. copies are observed on that interval, N 1 , N 2 , … , N n {\displaystyle N_{1},N_{2},\ldots ,N_{n}} , then the least squares functional for the intensity is
R n ( λ ) = ∫ 0 1 λ ( t ) 2 d t − 2 n ∑ i = 1 n ∫ 0 1 λ ( t ) d N i ( t ) {\displaystyle R_{n}(\lambda )=\int _{0}^{1}\lambda (t)^{2}dt-{\frac {2}{n}}\sum _{i=1}^{n}\int _{0}^{1}\lambda (t)dN_{i}(t)} which involves an Ito integral . If the assumption is made that λ ( t ) {\displaystyle \lambda (t)} is piecewise constant on [ 0 , 1 ] {\displaystyle [0,1]} , i.e. it depends on a vector of constants β = ( β 1 , β 2 , … , β m ) ∈ R + m {\displaystyle \beta =(\beta _{1},\beta _{2},\ldots ,\beta _{m})\in \mathbb {R} _{+}^{m}} and can be written
λ β = ∑ j = 1 m β j λ j , m , λ j , m = m 1 ( j − 1 m , j m ] {\displaystyle \lambda _{\beta }=\sum _{j=1}^{m}\beta _{j}\lambda _{j,m},\;\;\;\;\;\;\lambda _{j,m}={\sqrt {m}}\mathbf {1} _{({\frac {j-1}{m}},{\frac {j}{m}}]}} ,where the λ j , m {\displaystyle \lambda _{j,m}} have a factor of m {\displaystyle {\sqrt {m}}} so that they are orthonormal under the standard L 2 {\displaystyle L^{2}} norm, then by choosing appropriate data-driven weights w ^ j {\displaystyle {\hat {w}}_{j}} which depend on a parameter x > 0 {\displaystyle x>0} and introducing the weighted norm
‖ β ‖ w ^ = ∑ j = 2 m w ^ j | β j − β j − 1 | {\displaystyle \|\beta \|_{\hat {w}}=\sum _{j=2}^{m}{\hat {w}}_{j}|\beta _{j}-\beta _{j-1}|} ,the estimator for β {\displaystyle \beta } can be given:
β ^ = arg min β ∈ R + m { R n ( λ β ) + ‖ β ‖ w ^ } {\displaystyle {\hat {\beta }}=\arg \min _{\beta \in \mathbb {R} _{+}^{m}}\left\{R_{n}(\lambda _{\beta })+\|\beta \|_{\hat {w}}\right\}} .Then, the estimator λ ^ {\displaystyle {\hat {\lambda }}} is just λ β ^ {\displaystyle \lambda _{\hat {\beta }}} . With these preliminaries, an oracle inequality bounding the L 2 {\displaystyle L^{2}} norm ‖ λ ^ − λ ‖ {\displaystyle \|{\hat {\lambda }}-\lambda \|} is as follows: for appropriate choice of w ^ j ( x ) {\displaystyle {\hat {w}}_{j}(x)} ,
‖ λ ^ − λ ‖ 2 ≤ inf β ∈ R + m { ‖ λ β − λ ‖ 2 + 2 ‖ β ‖ w ^ } {\displaystyle \|{\hat {\lambda }}-\lambda \|^{2}\leq \inf _{\beta \in \mathbb {R} _{+}^{m}}\left\{\|\lambda _{\beta }-\lambda \|^{2}+2\|\beta \|_{\hat {w}}\right\}} with probability greater than or equal to 1 − 12.85 e − x {\displaystyle 1-12.85e^{-x}} .