|
|
Cramr-rao InequalityIn statistics, the Cramr-Rao inequality, named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, states that the reciprocal of the Fisher information, , of a parameter , is a lower bound on the variance of an unbiased estimator of the parameter (denoted ). -
\mathrm{var} \left(\hat{\theta}\right) \geq \frac{1}{\mathcal{I}(\theta)} = \frac{1} { \mathrm{E} \left \frac{d}{d\theta} \log f(X;\theta) \right ^2 \right] } In some cases, no unbiased estimator exists that realizes the lower bound. The Cramr-Rao inequality is also known as the Cramr-Rao bounds (CRB) or Cramr-Rao lower bounds (CRLB) because it puts a lower bounds on the variance of an estimator Regularity conditions This inequality relies on two weak regularity conditions on the probability density function, , and the estimator : - The Fisher information is always defined; equivalently, for all such that ,
-
-
- is finite.
- The operations of integration with respect to s and differentiation with respect to can be interchanged in the expectation of ; that is,
-
-
\frac{\partial}{\partial\theta} \leftT(x) f(x;\theta) \,dx \right = \int t(x) \leftf(x;\theta) \right \,dx - whenever the right-hand side is finite.
In some cases, a biased estimator can have both a variance and a mean squared error that are below the Cramr-Rao lower bound (the lower bound applies only to estimators that are unbiased). See bias (statistics). If the second regularity condition extends to the second derivative, then an alternative form of Fisher information can be used and yields a new Cramr-Rao inequality -
\mathrm{var} \left(\hat{\theta}\right) \geq \frac{1}{\mathcal{I}(\theta)} = \frac{1} { -\mathrm{E} \left\log f(X;\theta) \right } In some cases, it may be easier to take the expectation with respect to the second derivative than to take the expectation of the square of the first derivative. Multiparameter Extending the Cramr-Rao inequality to multiple parameters, define a parameter column vector -
with probability density function (pdf), , that satisfies the above two regularity conditions. The Fisher information matrix is a matrix with element defined as -
\mathcal{I}_{m, k} = \mathrm{E} \left\log f\left(x; \boldsymbol(\theta)\right) \frac{d}{d\theta_k} \log f\left(x; \boldsymbol(\theta)\right) \right then the Cramr-Rao inequality is -
\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right) \geq \frac {\partial \boldsymbol{\psi} \left(\boldsymbol{\theta}\right)} {\partial \boldsymbol{\theta}^T} \mathcal{I}\left(\boldsymbol{\theta}\right)^{-1} \frac {\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)^T} {\partial \boldsymbol{\theta}} where \boldsymbol{T}(X) = \begin{bmatrix} T_1(X) & T_2(X) & \cdots & T_d(X) \end{bmatrix}^T \boldsymbol{\psi} = \mathrm{E}\left\boldsymbol{T}(X)\right = \begin{bmatrix} \psi_1\left(\boldsymbol{\theta}\right) & \psi_2\left(\boldsymbol{\theta}\right) & \cdots & \psi_d\left(\boldsymbol{\theta}\right) \end{bmatrix}^T -
= \begin{bmatrix} \psi_1 \left(\boldsymbol{\theta}\right) \\ \psi_2 \left(\boldsymbol{\theta}\right) \\ \vdots \\ \psi_d \left(\boldsymbol{\theta}\right) \end{bmatrix} \begin{bmatrix} \frac{\partial}{\partial \theta_1} & \frac{\partial}{\partial \theta_2} & \cdots & \frac{\partial}{\partial \theta_d} \end{bmatrix} = \begin{bmatrix} \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} & \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} & \cdots & \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \\ \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} & \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} & \cdots & \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_1} & \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_2} & \cdots & \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \end{bmatrix} \frac{\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)^T}{\partial \boldsymbol{\theta}} = \begin{bmatrix} \frac{\partial}{\partial \theta_1} \\ \frac{\partial}{\partial \theta_2} \\ \vdots \\ \frac{\partial}{\partial \theta_d} \end{bmatrix} \begin{bmatrix} \psi_1 \left(\boldsymbol{\theta}\right) & \psi_2 \left(\boldsymbol{\theta}\right) & \cdots & \psi_d \left(\boldsymbol{\theta}\right) \end{bmatrix} = \begin{bmatrix} \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} & \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} & \cdots & \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_1} \\ \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} & \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} & \cdots & \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} & \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} & \cdots & \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \end{bmatrix} And is a positive-semidefinite matrix, that is -
If is an unbiased estimator (i.e., ) then the Cramr-Rao inequality is -
\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right) \geq \mathcal{I}\left(\boldsymbol{\theta}\right)^{-1} Single-parameter proof First, a more general version of the inequality will be proven; namely, that if the expectation of is denoted by , then for all -
The Cramr-Rao inequality will then follow as a consequence. Let be a random variable with probability density function . Here is a statistic, which is used as an estimator for . If is the score, i.e. -
then the expectation of , written , is zero. If we consider the covariance of and , we have , because . Expanding this expression we have -
{\rm cov}(V,T) = {\rm E} \left( T \cdot \frac{\partial}{\partial\theta} \ln f(X;\theta) \right) This may be expanded using the chain rule -
and the definition of expectation gives, after cancelling , -
{\rm E} \left( T \cdot \frac{\partial}{\partial\theta} \ln f(X;\theta) \right) = \int t(x) \leftf(x;\theta) \right \, dx = \frac{\partial}{\partial\theta} \leftt(x)f(x;\theta)\,dx \right = \psi^\prime(\theta) because the integration and differentiation operations commute (second condition). The Cauchy-Schwarz inequality shows that -
\sqrt{ {\rm var} (T) {\rm var} (V)} \geq {\rm cov}(V,T) = \psi^\prime (\theta) therefore -
{\rm var\ } T \geq \frac{\psi^\prime(\theta)^2} \exp \left( -\frac{1}{2} \left( \boldsymbol{x} - \boldsymbol{\mu} \right)^{T} C^{-1} \left( \boldsymbol{x} - \boldsymbol{\mu} \right) \right). The Fisher information matrix has elements -
\mathcal{I}_{m, k} = \frac{\partial \boldsymbol{\mu}^T}{\partial \theta_m} C^{-1} \frac{\partial \boldsymbol{\mu}}{\partial \theta_k} + \frac{1}{2} \mathrm{tr} \left( C^{-1} \frac{\partial C}{\partial \theta_m} C^{-1} \frac{\partial C}{\partial \theta_k} \right) where "tr" is the trace. Let be a white Gaussian noise (a sample of independent observations) with variance -
Then the Fisher information matrix is 1 × 1 -
\mathcal{I}(\sigma^2) = \frac{1}{2} \mathrm{tr} \left( C^{-1} \frac{\partial C}{\partial \theta_m} C^{-1} \frac{\partial C}{\partial \theta_k} \right) = \frac{1}{2 \sigma^2} \mathrm{tr} \left(I\right) = \frac{N}{2 \sigma^2}, and so the Cramr-Rao inequality is -
\mathrm{var}\left(\sigma^2\right) \geq \frac{2 \sigma^2}{N}.
|
 |