
(a) Σ =
1 0
0 1
(b) Σ =
1
1
/2
1
/2 1
(c) Σ =
1 −1
−1 3
Figure 2: Contour plots for example bivariate Gaussian distributions. Here
µ = 0
for all examples.
Examining these equations, we can see that the multivariate density coincides with the univariate
density in the special case when Σ is the scalar σ
2
.
Again, the vector
µ
specifies the mean of the multivariate Gaussian distribution. The matrix
Σ
specifies the covariance between each pair of variables in x:
Σ = cov(x, x) = E
(x − µ)(x − µ)
>
.
Covariance matrices are necessarily symmetric and positive semidefinite, which means their eigen-
values are nonnegative. Note that the density function above requires that
Σ
be positive definite, or
have strictly positive eigenvalues. A zero eigenvalue would result in a determinant of zero, making
the normalization impossible.
The dependence of the multivariate Gaussian density on
x
is entirely through the value of the
quadratic form
∆
2
= (x − µ)
>
Σ
−1
(x − µ).
The value
∆
(obtained via a square root) is called the Mahalanobis distance, and can be seen as a
generalization of the Z score
(x−µ)
σ
, often encountered in statistics.
To understand the behavior of the density geometrically, we can set the Mahalanobis distance to a
constant. The set of points in
R
d
satisfying
∆ = c
for any given value
c > 0
is an ellipsoid with
the eigenvectors of Σ defining the directions of the principal axes.
Figure 2 shows contour plots of the density of three bivariate (two-dimensional) Gaussian distribu-
tions. The elliptical shape of the contours is clear.
The Gaussian distribution has a number of convenient analytic properties, some of which we
describe below.
Marginalization
Often we will have a set of variables
x
with a joint multivariate Gaussian distribution, but only be
interested in reasoning about a subset of these variables. Suppose
x
has a multivariate Gaussian
distribution:
p(x | µ, Σ) = N(x, µ, Σ).
2