Covariance Matrix

In statistics, the covariance matrix generalizes the concept of variance from one to n dimensions, or in other words, from scalar-valued random variables to vector-valued random variables (tuples of scalar random variables). If X is a scalar-valued random variable with expected value μ then its variance is
\sigma^2={\rm var}(X)=E((X-\mu)^2)
If X is an n-by-1 column vector-valued random variable whose expected value is an n-by-1 column vector μ then its variance is the n-by-n nonnegative-definite matrix
\Sigma={\rm var}(X)=E((X-\mu)(X-\mu)^\top)
The entries in this matrix are the covariances between the n different scalar components of X. Since the covariance between a scalar-valued random variable and itself is its variance, it follows that, in particular, the entries on the diagonal of this matrix are the variances of the scalar components of X. This may appear to be a property of this matrix that depends on which coordinate system is chosen for the space in which the random vector X resides. However, it is true generally that if u is any unit vector, then the variance of the projection of X on u is uTΣu. (This point is expanded upon somewhat at http://www.wikipedia.org/wiki/Talk:Covariance_matrix. It is a consequence of an identity that appears below.) Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this the variance of the random vector X, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector X. With scalar-valued random variables X, we have the identity
{\rm var}(aX)=a^2{\rm var}(X)
if a is constant, i.e., not random. If X is an n-by-1 column vector-valued random variable and A is an m-by-n constant (i.e., non-random) matrix, then AX is an m-by-1 column vector-valued random variable, whose variance must therefore be an m-by-m matrix. It is
{\rm var}(AX)=A\Sigma A^\top
This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way. This is called PCA (principal components analysis) in statistics and KL-Transform (Karhunen-Love transform) in image processing.

Estimation

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1×1 matrix than as a mere scalar. See estimation of covariance matrices.

 

<< PreviousWord BrowserNext >>
canaan banana
anthropocentrism
coanda 1910
darmstadt (region)
rid of me
differential rotation
flying wing
m60 motorway
x ray binary
bottleneck
john joly
edward balliol
ripple tank
steven norris
amberg
skagerrak
godfrey kneller
pantyhose
refusal skills
uss kentucky
uss kentucky (bb 6)
simon marius
first moroccan crisis
uss kentucky (bb 66)
the adverts
idolatry
the adventures of tom sawyer
mine plow
divine grace
mine roller
name binding
compile time
aoc
placer deposit
thermal neutron
counting measure
algebra over a field
alucard
list of songs whose title constitutes the entire lyrics
kolmogorov's zero one law
wilhelm eduard weber
concept lattice
hms carnarvon
louis ii