Archive for the ‘Information Theory’ Category

MIMO Log-Det Formula for Channel Capacity with Real Input / Output Signals

November 5, 2014

    Consider a MIMO (multiple input multiple output) system of {N_t} transmitters and {N_r} receivers and assume

  \displaystyle {\bf R} = {\bf H}_0{\bf S}+ {\bf Z}, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

where {{\bf R}}, {{\bf S}} and {{\bf Z}} are column vectors of real random variables representing received signals, transmitted signals and noise.

    The capacity of the MIMO channel is given by [1]

\displaystyle C_{\rm MIMO} = \max_{f_{\bf S}\left({\bf s}\right)}I\left({\bf S}, {\bf R}\right) =  \max_{f_{\bf S}\left({\bf s}\right)}\left[h\left({\bf R}\right)-h\left({\bf R}|{\bf S}\right)\right], \ \ \ \ \ \ \ \ (2)

where {f_{\bf S}\left({\bf s}\right)} is the p.d.f. of the transmitted signal vector {{\bf S}}. It turns out [1, 2] that {h\left({\bf R}|{\bf S}\right) = h\left({\bf Z}\right)} where {h\left({\bf Z}\right)} is the differential entropy of the noise and satisfies

\displaystyle h\left({\bf Z}\right) = \frac{1}{2}\log_2\left[\left(2\pi e\right)^{N_r}\left|{\bf V}_{\bf Z}\right|\right]. \ \ \ \ \

Here {{\bf V_Z}} is the covariance matrix of the noise. Thus Eq. (2) becomes

\displaystyle C_{\rm MIMO} =  \max_{f_{\bf S}\left({\bf s}\right)}\left[h\left({\bf R}\right)-h\left({\bf Z}\right)\right]  = \max_{f_{\bf S}\left({\bf s}\right)}\left[h\left({\bf R}\right)\right]-h\left({\bf Z}\right), \ \ \ \ \ (3)

since the entropy of {{\bf Z}} does not depend on the probability density function of the input variable {{\bf S}}.

    In the SISO (single input single output) problem the mutual information was maximized while holding the variance of the output p.d.f. constant [3]. In the MIMO problem, the mutual information is maximized while holding the covariance matrix {{\bf V}_{\bf R}} of the output signal vector {{\bf R}} constant and setting the mean of {{\bf R}} to zero. The method for finding the maximum is outlined in the references [2, 3, 4].

    As has been shown [1, 2, 4] {h\left({\bf R}\right)} is maximized when {f_{\bf S}} is the p.d.f. of jointly-normal random variables. Using the jointly-normal p.d.f for the maximum indicated in Eq. (4) produces

\displaystyle C_{\rm MIMO} = h_{\rm norm}\left({\bf R}\right)-h\left({\bf Z}\right), \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left|{\bf V}_{\bf R}\right|-\frac{1}{2}\log_2\left|{\bf V}_{\bf Z}\right| = \frac{1}{2}\log_2\left|{\bf V}_{\bf R}\right|-\frac{1}{2}\log_2\left|\sigma_{Z}^2{\bf I}_{N_r}\right|. \ \ \ \ \ (4)

Next, the covariance matrix {{\bf V}_{\bf R}} of the received signal {{\bf R}} is evaluated in terms of the covariance matrix {{\bf V}_{\bf S}} of the input signal {{\bf S}} and the covariance matrix {{\bf V}_{\bf Z}} of the noise {{\bf Z}} using the relationship of Eq. (1). Thus

\displaystyle {\bf V}_{\bf R} = E\left({\bf R}{\bf R}^T\right) = E\left[\left({\bf H}_0{\bf S}+{\bf Z}\right)  \left({\bf H}_0{\bf S}+{\bf Z}\right)^T\right], \ \ \ \ \

\displaystyle = E\left[{\bf H}_0{\bf S}\left({\bf H}_0{\bf S}\right)^T+{\bf H}_0{\bf S}{\bf Z}^T + {\bf Z}  \left({\bf H}_0{\bf S}\right)^T+{\bf Z}{\bf Z}^T\right], \ \ \ \ \

\displaystyle = E\left[{\bf H}_0{\bf S}{\bf S}^T{\bf H}_0^T+{\bf H}_0{\bf S}{\bf Z}^T+{\bf Z}{\bf S}^T{\bf H}_0^T  +{\bf Z}{\bf Z}^T\right], \ \ \ \ \

\displaystyle = {\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T + {\bf H}_0E\left({\bf S}{\bf Z}^T\right)  + E\left({\bf Z}{\bf S}^T\right){\bf H}_0^T + {\bf V}_{\bf Z}, \ \ \ \ \ \ \ (5)


\displaystyle {\bf V}_{\bf R} = {\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T + {\bf V}_{\bf Z}  = {\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T +\sigma_Z^2{\bf I}_{N_r} , \ \ \ \ \ \ \ \ \ (6)

where two terms have been dropped from Eq. (5) because the noise and the transmitted signal are uncorrelated and the noise is assumed to have zero mean. Using Eq. (6) in Eq. (4) shows

\displaystyle C_{\rm MIMO} = \frac{1}{2}\log_2\left|{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T+\sigma_Z^2{\bf I}_{N_r}\right| -\frac{1}{2}\log_2\left|\sigma_Z^2{\bf I}_{N_r}\right|, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left|{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T + \sigma_Z^2{\bf I}_{N_r}\right|\left|  \sigma_Z^2{\bf I}_{N_r}\right|^{-1}\right], \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left|{\bf I}_{N_r}+\frac{1}{\sigma_Z^2}{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T\right|, \ \ \ \ \

the capacity per channel use or capacity in bits per sample. The capacity per unit time (or bits/second) is therefore [3]

\displaystyle C_{t{\rm MIMO}} = B\log_2\left|{\bf I}_{N_r}+\frac{1}{\sigma_Z^2}{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T  \right|, \ \ \ \ \

where {B} is the signal bandwidth and {2B} is the sampling rate.
In the SISO case, this reduces to

\displaystyle C_t = B\log_2\left(1+\frac{\sigma_S^2}{\sigma_Z^2}\right) = B\log_2\left(1+\frac{P_S}{P_Z}\right), \ \ \ \ \

where {\sigma_S^2} is the variance of the input signal and {P_S/P_Z} is the ratio of the input signal power to noise power. Thus the result for {C_t} in the MIMO case reduces to {C_t} for the SISO case [3] as expected.

    A more detailed derivation is given in [2] and is available by clicking below.


[1] J. R. Hampton, Introduction to MIMO Communications\/, Cambridge University Press, N.Y. (2014).

[2] H. L. Rappaport, “Derivation of MIMO Log-Det Formula for Channel Capacity with Real Input / Output Signals,” 7G Communications, 7GCTN05, November 5, 2014. real

[3] H. L. Rappaport, “Notes on Information Theory II and the Geometric Interpretation of the Shannon Channel Capacity,” 7G Communications, 7GCTN02, October 2014. infoII

[4] I. E. Telatar, “Capacity of Multi-Antenna Gaussian Channels,” Euro. Trans. Telecommun., 10 (6), (1999), pp. 585 – 595.


Differential Entropy of Multivariate Distributions

October 6, 2014

The following computations are set forth because of their relevance to recent advanced multi-antenna techniques [1]. The presentation follows that given by Cover and Thomas [2]. Details on multivariate distributions can be found in Hogg and Craig [3] and in detailed computations given below [4, 5]. The joint differential entropy of {n} random variables satisfies

\displaystyle h\left({\bf x}\right) = - \int f\left({\bf x}\right)\log_2 f\left({\bf x}\right)\,d^nx. \ \ \ \ \

The p.d.f. (probability density function) of n jointly normal real random variables is given by

\displaystyle f\left({\bf x}\right)=\frac{1}{\sqrt{\left(2\pi\right)^n\left|{\bf V}\right|}}\exp\left[-\frac{1}{2}\left({\bf x}-{\mu}\right)^T{\bf V}^{-1}\left({\bf x}-\mu\right)\right],\ \ \ \ \

where {{\bf x}} and {\mu} are length n column vectors, {{\bf V}} is the n\times n covariance matrix and {\left|{\bf V}\right|} is its determinant. In this case,

\displaystyle h\left({\bf x}\right) = -\frac{1}{\ln 2}\int f\left({\bf x}\right)\left\{-\frac{1}{2}\ln\left[\left(2\pi\right)^n \left|{\bf V}\right|\,\right]-\frac{1}{2}\left[\left({\bf x}-\mu\right)^T{\bf V}^{-1}\left({\bf x}-\mu\right)\right] \right\}\,d^nx, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\,\right]+\frac{1}{2\ln 2}E \left[\left({\bf x}-\mu\right)^T{\bf V}^{-1}\left({\bf x}-\mu\right)\right]. \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}E\left[\sum_{i,j=1}^n \left(x_i-\mu_i\right)\left({\bf V}^{-1}\right)_{ij}\left(x_j-\mu_j\right)\right]. \ \ \ \ \

Since the expectation of a sum is the sum of the expectations h\left({\bf x}\right)

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}\sum_{i,j=1}^n E\left[\left(x_i-\mu_i\right)\left({\bf V}^{-1}\right)_{ij}\left(x_j-\mu_j\right)\right], \ \ \ \ \


\displaystyle h\left({\bf x}\right) = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}\sum_{i,j=1}^n \left({\bf V}^{-1}\right)_{ij}E\left[\left(x_i-\mu_i\right)\left(x_j-\mu_j\right)\right]. \ \ \ \ \


\displaystyle E\left[\left({\bf x}-\mu\right)\left({\bf x}^T-{\mu}^T\right)\right]={\bf V},\ \ \ \ \

this becomes

\displaystyle h\left({\bf x}\right) = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}\sum_{i,j=1}^n \left({\bf V}^{-1}\right)_{ij}\left({\bf V}\right)_{ij}.\ \ \ \ \

But the covariance and its inverse are necessarily real symmetric matrices, so

\displaystyle h\left({\bf x}\right)= \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\,\right]+\frac{1}{2\ln 2}\sum_{j=1}^n\delta_{jj}, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\,\right]+\frac{1}{2}\log_2 e^n, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi e\right)^n\left|{\bf V}\right|\,\right], \ \ \ \ \

the desired result. Additional related derivations are provided in [4,5] and are available by clicking below.


[1] M. Debbah in A. Sibille, C. Oestges, A. Zanella, (editors) MIMO From Theory to Implementation, Academic Press, Amsterdam (2011), Chap. 1.

[2] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, N.Y. (1991), p. 230.

[3] R. V. Hogg and A.T. Craig, Introduction to Mathematical Statistics, Macmillan, N.Y. (1978), Chap. 12.

[4] H. L. Rappaport, Normal and Bivariate Normal Distributions and Moment-Generating Functions, 7G Communications, 7GCTN03, September (2014). bivar

[5] H. L. Rappaport, Multivariate Distributions and Associated Differential Entropy of Jointly Normal Random Variables, 7G Communications, 7GCTN04, October (2014). multi

Geometric Interpretation of the Channel Capacity Theorem

October 6, 2014

Papoulis [1] shows that a band-limited signal {x\left(t\right)} may be written

\displaystyle{}x\left(t\right)=\sum_{j=-\infty}^\infty x_jS_j\left(t\right)=\sum_{j = - \infty}^\infty x_j\,\frac{\sin\left(2\pi Bt-j\pi\right)}{2\pi Bt - j\pi},\ \ \ \ \ (1)

where {x_j} is the value of {x\left(t\right)} at the {j^{\rm th}} sample time and {B} is the bandwidth. The signal spectrum is assumed to vanish for frequencies outside the domain {\left[-B,B\right]}. Reza [2] considers a signal of this type that is approximately limited to a time interval of duration {T}, e.g., the signal vanishes for times {t} outside the domain {\left[-T/2,T/2\right]}. The duration of the functions S_j\left(t\right) is on the order of 1/B and the time between samples is ={1/2B}. The duration of the signal x_j\left(t\right) is thus given approximately as (number of samples + 1)/2B. Now {BT\gg1} is assumed throughout these calculations so

\displaystyle x\left(t\right)\approx\sum_{j = - BT}^{BT} x_j\,\frac{\sin\left(2\pi Bt-j\pi\right)}{2\pi Bt -j\pi}.

Next, the average power in the signal {P_x} is computed

\displaystyle P_x = \frac{1}{T}\int_{-T/2}^{T/2}\left[x\left(t\right)\right]^2\,dt\approx\frac{1}{T}\int_{-\infty}^{\infty}\left[x\left(t\right)\right]^2\,dt

\displaystyle =\frac{1}{2BT}\sum_{j=-BT}^{BT} x_j^2\ \ \ \ \ \ \ \ \ \ (2)

as shown in detail in [3]. It turns out that the functions S_j\left(t\right) in Eq. (1) form an orthogonal set [2, 3]. Thus Reza argues that the sum in Eq. (2) can be viewed as the norm squared of a vector in a {2BT} dimensional vector space. The vector coordinates are given by the {x_j} and {x_0 = 0} is assumed. If the length of the vector is {d_x} then from Eq. (2)

\displaystyle d_x=\sqrt{2BTP_x}

Thus in the Gaussian noise channel with input {X}, output {Y} and noise {N} satisfying

\displaystyle Y=X+N, \ \ \ \ \

the input signal is represented by a point a distance d_x from the origin in the 2BT dimensional space. The output signal is represented by a point a distance

\displaystyle d_y=\sqrt{2BT\left(P_x+P_n\right)}, \ \ \ \ \

from the origin, given that the input signal and the noise are uncorrelated. The noise is represented by a point a distance

\displaystyle d_n=\sqrt{2BTP_n}, \ \ \ \ \

from the origin.

The requirement for transmission of signals without noise is that the allowable signal points in the {2BT} dimensional space must be separated a distance given by twice the length of the noise vector. Each of the received signals are represented by a point on a sphere with radius {d_y} in {2BT} dimensional space.

The question is now how many distinct signals (points on the sphere) can be allowed while keeping the separation between the points equal to {2d_n}? Enforcing this requirement permits decoding this signal without ambiguity. Alternatively, one can ask how many non-overlapping noise spheres can be embedded in the surface of the output signal’s sphere? Each noise sphere has radius {d_n} and has it’s center on the surface of the output signal’s sphere. Reza argues this problem is equivalent to asking how many spheres of radius {d_n} can be placed within the sphere of radius {d_y} because for {2BT} very large, e.g., in a very high dimensional space, most of the volume of a sphere is close to its surface.

According to these prescriptions the number of allowed signals {M} is given by

\displaystyle M\approx\frac{\mbox{Volume of sphere with radius }d_y}{\mbox{Volume of sphere with radius }d_n}\ \ \ \

where the volumes are to be computed in 2BT dimensional space. Since the volume of a sphere in {p} dimensional space is proportional to {r^p} where r is the sphere’s radius,

\displaystyle M\approx\left(\frac{d_y}{d_n}\right)^{2BT}=\left(\frac{P_x+P_n}{P_n}\right)^{BT}.

The number of bits sent by these {M} allowed signals is

\displaystyle \mbox{number of bits}=\log_2 M=BT\log_2\left(1+\frac{P_x}{P_n}\right),

and so the channel capacity {C_t} in bits/s is

\displaystyle C_t = \frac{1}{T}\log_2 M=B\log_2\left(1+\frac{P_x}{P_n}\right),

which is Shannon’s [4] famous result. A more detailed derivation is provided in [3] and is available by clicking below.


[1] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-Hill, N.Y. (1965), p. 176.

[2] F. M. Reza, An Introduction to Information Theory, McGraw-Hill, N.Y. (1961), pp. 318 – 320.

[3] H. L. Rappaport, Notes on Information Theory II and the Geometric Interpretation of the Shannon Channel Capacity, 7G Communications, 7GCTN02 (2014); infoII

[4] C.E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal (1948).

Source Coding Theorem

October 3, 2014

One way to understand the source coding theorem [1, 2] is to first consider a general discrete source with an output alphabet of {J} symbols. It would require {\log_2 J} bits to transmit a given symbol. Similarly, for a block of n sequential ordered elements of the alphabet there would be {J^n} possible messages that can be transmitted in one block. It would thus require {\log_2 J^n= n\log_2 J} bits to send a message of {n} symbols.

Next specialize to the case of a discrete memoryless source also with an output alphabet of {J} symbols. Let the {i^{th}} member of the alphabet be denoted {x_i} and let the probability of occurrence of {x_i} be denoted {p_i}. Now the block of {n} symbols where {n} is very large is considered. Let the expected number of occurrences of {x_i} be denoted {n_i}. We have that for {n} very large

\displaystyle n_i = p_in\qquad\qquad\qquad\qquad

That is to say for {n} very large the number of occurrences of each of the {x_i} is known and the number of likely messages is not {J^n} but instead the number of distinct permutations of a block of symbols with

\displaystyle p_0n \mbox{ occurrences of } x_0,

\displaystyle p_1n \mbox{ occurrences of } x_1,

\displaystyle p_2n \mbox{ occurrences of } x_2,

\displaystyle ~~~~~~~~~\vdots

\displaystyle p_{J-1}n \mbox{ occurrences of } x_{J-1}.

So the number of possible messages that can be sent in one block of {n} symbols is

\displaystyle \mbox{Num. of messages} = \frac{n!}{\left(p_0n\right)!\left(p_1n\right)!\left(p_2n\right)!\ldots\left(p_{J-1}n\right)!}    .\ \ \ \ \

Therefore, the number of bits required to send these messages or, equivalently, the number of bits needed to indicate to a receiver which of these messages was transmitted would be

\displaystyle{}N_{\rm bits}=\log_2\left[\frac{n!}{\left(p_0n\right)!\left(p_1n\right)!\left(p_2n\right)!\ldots\left(p_{J-1}n\right)!}\right]


\displaystyle{}N_{\rm bits}\ln 2=\ln n!-\sum_{j=0}^{J-1}\ln\left[\left(p_jn\right)!\right]. \ \ \ \ \

Using the approximation {\ln n! =n\ln n - n} [3] shows

\displaystyle{}N_{\rm bits}\ln 2=n\ln n-n-\sum_{j=0}^{J-1}\left[p_j n\ln\left(p_j n\right)-p_j n\right], \ \ \ \ \

\displaystyle=n\ln n-n-n\sum_{j=0}^{J-1}\left[p_j\ln p_j + p_j\ln n-p_j\right], \ \ \ \ \

\displaystyle=-n\sum_{j=0}^{J-1}p_j\ln p_j.\ \ \ \ \


\displaystyle{}N_{\rm bits}=-n\sum_{j=0}^{J-1}p_j\log_2 p_j=nH\left({\cal X}\right), \ \ \ \ \

where {H\left({\cal X}\right)} is the entropy of the alphabet.

Of course more bits can always be used to code all of these different messages but this presents the minimum needed when {n} is very large. The minimum required average number of bits per transmitted symbol is denoted {\bar L_{\rm min}} and thus satisfies

\displaystyle{}\bar L_{\rm min}=\frac{N_{\rm bits}}{n}=H\left({\cal X}\right). \ \ \ \ \

This result is the source coding theorem [1, 2]. Of course we have not shown that uniquely decodable codes exist that operate on one symbol at a time and can obtain this limit. However, we have shown that this limit exists.

I have provided [4] a more detailed write-up introducing information theory available by clicking below.


[1] S. Haykin, Digital Communications, Wiley, N.Y. (1988), Chapter 2.

[2] C.E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal (1948).

[3] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, Dover, N.Y. (1965), pp. 255 – 257.

[4] H.L. Rappaport, Notes on Information Theory I  and the Source Coding Theorem, 7G Communications, 7GCTN01 (2014); infoI