MIMO Log-Det Formula for Channel Capacity with Real Input / Output Signals

November 5, 2014

    Consider a MIMO (multiple input multiple output) system of {N_t} transmitters and {N_r} receivers and assume

  \displaystyle {\bf R} = {\bf H}_0{\bf S}+ {\bf Z}, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

where {{\bf R}}, {{\bf S}} and {{\bf Z}} are column vectors of real random variables representing received signals, transmitted signals and noise.

    The capacity of the MIMO channel is given by [1]

\displaystyle C_{\rm MIMO} = \max_{f_{\bf S}\left({\bf s}\right)}I\left({\bf S}, {\bf R}\right) =  \max_{f_{\bf S}\left({\bf s}\right)}\left[h\left({\bf R}\right)-h\left({\bf R}|{\bf S}\right)\right], \ \ \ \ \ \ \ \ (2)

where {f_{\bf S}\left({\bf s}\right)} is the p.d.f. of the transmitted signal vector {{\bf S}}. It turns out [1, 2] that {h\left({\bf R}|{\bf S}\right) = h\left({\bf Z}\right)} where {h\left({\bf Z}\right)} is the differential entropy of the noise and satisfies

\displaystyle h\left({\bf Z}\right) = \frac{1}{2}\log_2\left[\left(2\pi e\right)^{N_r}\left|{\bf V}_{\bf Z}\right|\right]. \ \ \ \ \

Here {{\bf V_Z}} is the covariance matrix of the noise. Thus Eq. (2) becomes

\displaystyle C_{\rm MIMO} =  \max_{f_{\bf S}\left({\bf s}\right)}\left[h\left({\bf R}\right)-h\left({\bf Z}\right)\right]  = \max_{f_{\bf S}\left({\bf s}\right)}\left[h\left({\bf R}\right)\right]-h\left({\bf Z}\right), \ \ \ \ \ (3)

since the entropy of {{\bf Z}} does not depend on the probability density function of the input variable {{\bf S}}.

    In the SISO (single input single output) problem the mutual information was maximized while holding the variance of the output p.d.f. constant [3]. In the MIMO problem, the mutual information is maximized while holding the covariance matrix {{\bf V}_{\bf R}} of the output signal vector {{\bf R}} constant and setting the mean of {{\bf R}} to zero. The method for finding the maximum is outlined in the references [2, 3, 4].

    As has been shown [1, 2, 4] {h\left({\bf R}\right)} is maximized when {f_{\bf S}} is the p.d.f. of jointly-normal random variables. Using the jointly-normal p.d.f for the maximum indicated in Eq. (4) produces

\displaystyle C_{\rm MIMO} = h_{\rm norm}\left({\bf R}\right)-h\left({\bf Z}\right), \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left|{\bf V}_{\bf R}\right|-\frac{1}{2}\log_2\left|{\bf V}_{\bf Z}\right| = \frac{1}{2}\log_2\left|{\bf V}_{\bf R}\right|-\frac{1}{2}\log_2\left|\sigma_{Z}^2{\bf I}_{N_r}\right|. \ \ \ \ \ (4)

Next, the covariance matrix {{\bf V}_{\bf R}} of the received signal {{\bf R}} is evaluated in terms of the covariance matrix {{\bf V}_{\bf S}} of the input signal {{\bf S}} and the covariance matrix {{\bf V}_{\bf Z}} of the noise {{\bf Z}} using the relationship of Eq. (1). Thus

\displaystyle {\bf V}_{\bf R} = E\left({\bf R}{\bf R}^T\right) = E\left[\left({\bf H}_0{\bf S}+{\bf Z}\right)  \left({\bf H}_0{\bf S}+{\bf Z}\right)^T\right], \ \ \ \ \

\displaystyle = E\left[{\bf H}_0{\bf S}\left({\bf H}_0{\bf S}\right)^T+{\bf H}_0{\bf S}{\bf Z}^T + {\bf Z}  \left({\bf H}_0{\bf S}\right)^T+{\bf Z}{\bf Z}^T\right], \ \ \ \ \

\displaystyle = E\left[{\bf H}_0{\bf S}{\bf S}^T{\bf H}_0^T+{\bf H}_0{\bf S}{\bf Z}^T+{\bf Z}{\bf S}^T{\bf H}_0^T  +{\bf Z}{\bf Z}^T\right], \ \ \ \ \

\displaystyle = {\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T + {\bf H}_0E\left({\bf S}{\bf Z}^T\right)  + E\left({\bf Z}{\bf S}^T\right){\bf H}_0^T + {\bf V}_{\bf Z}, \ \ \ \ \ \ \ (5)


\displaystyle {\bf V}_{\bf R} = {\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T + {\bf V}_{\bf Z}  = {\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T +\sigma_Z^2{\bf I}_{N_r} , \ \ \ \ \ \ \ \ \ (6)

where two terms have been dropped from Eq. (5) because the noise and the transmitted signal are uncorrelated and the noise is assumed to have zero mean. Using Eq. (6) in Eq. (4) shows

\displaystyle C_{\rm MIMO} = \frac{1}{2}\log_2\left|{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T+\sigma_Z^2{\bf I}_{N_r}\right| -\frac{1}{2}\log_2\left|\sigma_Z^2{\bf I}_{N_r}\right|, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left|{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T + \sigma_Z^2{\bf I}_{N_r}\right|\left|  \sigma_Z^2{\bf I}_{N_r}\right|^{-1}\right], \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left|{\bf I}_{N_r}+\frac{1}{\sigma_Z^2}{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T\right|, \ \ \ \ \

the capacity per channel use or capacity in bits per sample. The capacity per unit time (or bits/second) is therefore [3]

\displaystyle C_{t{\rm MIMO}} = B\log_2\left|{\bf I}_{N_r}+\frac{1}{\sigma_Z^2}{\bf H}_0{\bf V}_{\bf S}{\bf H}_0^T  \right|, \ \ \ \ \

where {B} is the signal bandwidth and {2B} is the sampling rate.
In the SISO case, this reduces to

\displaystyle C_t = B\log_2\left(1+\frac{\sigma_S^2}{\sigma_Z^2}\right) = B\log_2\left(1+\frac{P_S}{P_Z}\right), \ \ \ \ \

where {\sigma_S^2} is the variance of the input signal and {P_S/P_Z} is the ratio of the input signal power to noise power. Thus the result for {C_t} in the MIMO case reduces to {C_t} for the SISO case [3] as expected.

    A more detailed derivation is given in [2] and is available by clicking below.


[1] J. R. Hampton, Introduction to MIMO Communications\/, Cambridge University Press, N.Y. (2014).

[2] H. L. Rappaport, “Derivation of MIMO Log-Det Formula for Channel Capacity with Real Input / Output Signals,” 7G Communications, 7GCTN05, November 5, 2014. real

[3] H. L. Rappaport, “Notes on Information Theory II and the Geometric Interpretation of the Shannon Channel Capacity,” 7G Communications, 7GCTN02, October 2014. infoII

[4] I. E. Telatar, “Capacity of Multi-Antenna Gaussian Channels,” Euro. Trans. Telecommun., 10 (6), (1999), pp. 585 – 595.


Differential Entropy of Multivariate Distributions

October 6, 2014

The following computations are set forth because of their relevance to recent advanced multi-antenna techniques [1]. The presentation follows that given by Cover and Thomas [2]. Details on multivariate distributions can be found in Hogg and Craig [3] and in detailed computations given below [4, 5]. The joint differential entropy of {n} random variables satisfies

\displaystyle h\left({\bf x}\right) = - \int f\left({\bf x}\right)\log_2 f\left({\bf x}\right)\,d^nx. \ \ \ \ \

The p.d.f. (probability density function) of n jointly normal real random variables is given by

\displaystyle f\left({\bf x}\right)=\frac{1}{\sqrt{\left(2\pi\right)^n\left|{\bf V}\right|}}\exp\left[-\frac{1}{2}\left({\bf x}-{\mu}\right)^T{\bf V}^{-1}\left({\bf x}-\mu\right)\right],\ \ \ \ \

where {{\bf x}} and {\mu} are length n column vectors, {{\bf V}} is the n\times n covariance matrix and {\left|{\bf V}\right|} is its determinant. In this case,

\displaystyle h\left({\bf x}\right) = -\frac{1}{\ln 2}\int f\left({\bf x}\right)\left\{-\frac{1}{2}\ln\left[\left(2\pi\right)^n \left|{\bf V}\right|\,\right]-\frac{1}{2}\left[\left({\bf x}-\mu\right)^T{\bf V}^{-1}\left({\bf x}-\mu\right)\right] \right\}\,d^nx, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\,\right]+\frac{1}{2\ln 2}E \left[\left({\bf x}-\mu\right)^T{\bf V}^{-1}\left({\bf x}-\mu\right)\right]. \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}E\left[\sum_{i,j=1}^n \left(x_i-\mu_i\right)\left({\bf V}^{-1}\right)_{ij}\left(x_j-\mu_j\right)\right]. \ \ \ \ \

Since the expectation of a sum is the sum of the expectations h\left({\bf x}\right)

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}\sum_{i,j=1}^n E\left[\left(x_i-\mu_i\right)\left({\bf V}^{-1}\right)_{ij}\left(x_j-\mu_j\right)\right], \ \ \ \ \


\displaystyle h\left({\bf x}\right) = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}\sum_{i,j=1}^n \left({\bf V}^{-1}\right)_{ij}E\left[\left(x_i-\mu_i\right)\left(x_j-\mu_j\right)\right]. \ \ \ \ \


\displaystyle E\left[\left({\bf x}-\mu\right)\left({\bf x}^T-{\mu}^T\right)\right]={\bf V},\ \ \ \ \

this becomes

\displaystyle h\left({\bf x}\right) = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\right]+\frac{1}{2\ln 2}\sum_{i,j=1}^n \left({\bf V}^{-1}\right)_{ij}\left({\bf V}\right)_{ij}.\ \ \ \ \

But the covariance and its inverse are necessarily real symmetric matrices, so

\displaystyle h\left({\bf x}\right)= \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\,\right]+\frac{1}{2\ln 2}\sum_{j=1}^n\delta_{jj}, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi\right)^n\left|{\bf V}\right|\,\right]+\frac{1}{2}\log_2 e^n, \ \ \ \ \

\displaystyle = \frac{1}{2}\log_2\left[\left(2\pi e\right)^n\left|{\bf V}\right|\,\right], \ \ \ \ \

the desired result. Additional related derivations are provided in [4,5] and are available by clicking below.


[1] M. Debbah in A. Sibille, C. Oestges, A. Zanella, (editors) MIMO From Theory to Implementation, Academic Press, Amsterdam (2011), Chap. 1.

[2] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, N.Y. (1991), p. 230.

[3] R. V. Hogg and A.T. Craig, Introduction to Mathematical Statistics, Macmillan, N.Y. (1978), Chap. 12.

[4] H. L. Rappaport, Normal and Bivariate Normal Distributions and Moment-Generating Functions, 7G Communications, 7GCTN03, September (2014). bivar

[5] H. L. Rappaport, Multivariate Distributions and Associated Differential Entropy of Jointly Normal Random Variables, 7G Communications, 7GCTN04, October (2014). multi

Geometric Interpretation of the Channel Capacity Theorem

October 6, 2014

Papoulis [1] shows that a band-limited signal {x\left(t\right)} may be written

\displaystyle{}x\left(t\right)=\sum_{j=-\infty}^\infty x_jS_j\left(t\right)=\sum_{j = - \infty}^\infty x_j\,\frac{\sin\left(2\pi Bt-j\pi\right)}{2\pi Bt - j\pi},\ \ \ \ \ (1)

where {x_j} is the value of {x\left(t\right)} at the {j^{\rm th}} sample time and {B} is the bandwidth. The signal spectrum is assumed to vanish for frequencies outside the domain {\left[-B,B\right]}. Reza [2] considers a signal of this type that is approximately limited to a time interval of duration {T}, e.g., the signal vanishes for times {t} outside the domain {\left[-T/2,T/2\right]}. The duration of the functions S_j\left(t\right) is on the order of 1/B and the time between samples is ={1/2B}. The duration of the signal x_j\left(t\right) is thus given approximately as (number of samples + 1)/2B. Now {BT\gg1} is assumed throughout these calculations so

\displaystyle x\left(t\right)\approx\sum_{j = - BT}^{BT} x_j\,\frac{\sin\left(2\pi Bt-j\pi\right)}{2\pi Bt -j\pi}.

Next, the average power in the signal {P_x} is computed

\displaystyle P_x = \frac{1}{T}\int_{-T/2}^{T/2}\left[x\left(t\right)\right]^2\,dt\approx\frac{1}{T}\int_{-\infty}^{\infty}\left[x\left(t\right)\right]^2\,dt

\displaystyle =\frac{1}{2BT}\sum_{j=-BT}^{BT} x_j^2\ \ \ \ \ \ \ \ \ \ (2)

as shown in detail in [3]. It turns out that the functions S_j\left(t\right) in Eq. (1) form an orthogonal set [2, 3]. Thus Reza argues that the sum in Eq. (2) can be viewed as the norm squared of a vector in a {2BT} dimensional vector space. The vector coordinates are given by the {x_j} and {x_0 = 0} is assumed. If the length of the vector is {d_x} then from Eq. (2)

\displaystyle d_x=\sqrt{2BTP_x}

Thus in the Gaussian noise channel with input {X}, output {Y} and noise {N} satisfying

\displaystyle Y=X+N, \ \ \ \ \

the input signal is represented by a point a distance d_x from the origin in the 2BT dimensional space. The output signal is represented by a point a distance

\displaystyle d_y=\sqrt{2BT\left(P_x+P_n\right)}, \ \ \ \ \

from the origin, given that the input signal and the noise are uncorrelated. The noise is represented by a point a distance

\displaystyle d_n=\sqrt{2BTP_n}, \ \ \ \ \

from the origin.

The requirement for transmission of signals without noise is that the allowable signal points in the {2BT} dimensional space must be separated a distance given by twice the length of the noise vector. Each of the received signals are represented by a point on a sphere with radius {d_y} in {2BT} dimensional space.

The question is now how many distinct signals (points on the sphere) can be allowed while keeping the separation between the points equal to {2d_n}? Enforcing this requirement permits decoding this signal without ambiguity. Alternatively, one can ask how many non-overlapping noise spheres can be embedded in the surface of the output signal’s sphere? Each noise sphere has radius {d_n} and has it’s center on the surface of the output signal’s sphere. Reza argues this problem is equivalent to asking how many spheres of radius {d_n} can be placed within the sphere of radius {d_y} because for {2BT} very large, e.g., in a very high dimensional space, most of the volume of a sphere is close to its surface.

According to these prescriptions the number of allowed signals {M} is given by

\displaystyle M\approx\frac{\mbox{Volume of sphere with radius }d_y}{\mbox{Volume of sphere with radius }d_n}\ \ \ \

where the volumes are to be computed in 2BT dimensional space. Since the volume of a sphere in {p} dimensional space is proportional to {r^p} where r is the sphere’s radius,

\displaystyle M\approx\left(\frac{d_y}{d_n}\right)^{2BT}=\left(\frac{P_x+P_n}{P_n}\right)^{BT}.

The number of bits sent by these {M} allowed signals is

\displaystyle \mbox{number of bits}=\log_2 M=BT\log_2\left(1+\frac{P_x}{P_n}\right),

and so the channel capacity {C_t} in bits/s is

\displaystyle C_t = \frac{1}{T}\log_2 M=B\log_2\left(1+\frac{P_x}{P_n}\right),

which is Shannon’s [4] famous result. A more detailed derivation is provided in [3] and is available by clicking below.


[1] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-Hill, N.Y. (1965), p. 176.

[2] F. M. Reza, An Introduction to Information Theory, McGraw-Hill, N.Y. (1961), pp. 318 – 320.

[3] H. L. Rappaport, Notes on Information Theory II and the Geometric Interpretation of the Shannon Channel Capacity, 7G Communications, 7GCTN02 (2014); infoII

[4] C.E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal (1948).

Source Coding Theorem

October 3, 2014

One way to understand the source coding theorem [1, 2] is to first consider a general discrete source with an output alphabet of {J} symbols. It would require {\log_2 J} bits to transmit a given symbol. Similarly, for a block of n sequential ordered elements of the alphabet there would be {J^n} possible messages that can be transmitted in one block. It would thus require {\log_2 J^n= n\log_2 J} bits to send a message of {n} symbols.

Next specialize to the case of a discrete memoryless source also with an output alphabet of {J} symbols. Let the {i^{th}} member of the alphabet be denoted {x_i} and let the probability of occurrence of {x_i} be denoted {p_i}. Now the block of {n} symbols where {n} is very large is considered. Let the expected number of occurrences of {x_i} be denoted {n_i}. We have that for {n} very large

\displaystyle n_i = p_in\qquad\qquad\qquad\qquad

That is to say for {n} very large the number of occurrences of each of the {x_i} is known and the number of likely messages is not {J^n} but instead the number of distinct permutations of a block of symbols with

\displaystyle p_0n \mbox{ occurrences of } x_0,

\displaystyle p_1n \mbox{ occurrences of } x_1,

\displaystyle p_2n \mbox{ occurrences of } x_2,

\displaystyle ~~~~~~~~~\vdots

\displaystyle p_{J-1}n \mbox{ occurrences of } x_{J-1}.

So the number of possible messages that can be sent in one block of {n} symbols is

\displaystyle \mbox{Num. of messages} = \frac{n!}{\left(p_0n\right)!\left(p_1n\right)!\left(p_2n\right)!\ldots\left(p_{J-1}n\right)!}    .\ \ \ \ \

Therefore, the number of bits required to send these messages or, equivalently, the number of bits needed to indicate to a receiver which of these messages was transmitted would be

\displaystyle{}N_{\rm bits}=\log_2\left[\frac{n!}{\left(p_0n\right)!\left(p_1n\right)!\left(p_2n\right)!\ldots\left(p_{J-1}n\right)!}\right]


\displaystyle{}N_{\rm bits}\ln 2=\ln n!-\sum_{j=0}^{J-1}\ln\left[\left(p_jn\right)!\right]. \ \ \ \ \

Using the approximation {\ln n! =n\ln n - n} [3] shows

\displaystyle{}N_{\rm bits}\ln 2=n\ln n-n-\sum_{j=0}^{J-1}\left[p_j n\ln\left(p_j n\right)-p_j n\right], \ \ \ \ \

\displaystyle=n\ln n-n-n\sum_{j=0}^{J-1}\left[p_j\ln p_j + p_j\ln n-p_j\right], \ \ \ \ \

\displaystyle=-n\sum_{j=0}^{J-1}p_j\ln p_j.\ \ \ \ \


\displaystyle{}N_{\rm bits}=-n\sum_{j=0}^{J-1}p_j\log_2 p_j=nH\left({\cal X}\right), \ \ \ \ \

where {H\left({\cal X}\right)} is the entropy of the alphabet.

Of course more bits can always be used to code all of these different messages but this presents the minimum needed when {n} is very large. The minimum required average number of bits per transmitted symbol is denoted {\bar L_{\rm min}} and thus satisfies

\displaystyle{}\bar L_{\rm min}=\frac{N_{\rm bits}}{n}=H\left({\cal X}\right). \ \ \ \ \

This result is the source coding theorem [1, 2]. Of course we have not shown that uniquely decodable codes exist that operate on one symbol at a time and can obtain this limit. However, we have shown that this limit exists.

I have provided [4] a more detailed write-up introducing information theory available by clicking below.


[1] S. Haykin, Digital Communications, Wiley, N.Y. (1988), Chapter 2.

[2] C.E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal (1948).

[3] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, Dover, N.Y. (1965), pp. 255 – 257.

[4] H.L. Rappaport, Notes on Information Theory I  and the Source Coding Theorem, 7G Communications, 7GCTN01 (2014); infoI

LTE Radio Access Network Protocol Architecture

September 24, 2014

On the user plane side from the bottom up the layers are [1]


On the control plane side from the bottom up the layers are



PHY is the Physical Layer
MAC is Medium Access Control
RLC is the Radio Link Control
PDCP is the Packet Data Convergence Protocol
RRC is the Radio Resource Control
NAS is the Non-Access Stratum


[1] 4G LTE/LTE-Advanced for Mobile Broadband, by E Dahlman, S. Parkvall, and J. Sköld, Academic Press, Amsterdam, 2011, p. 112.

Circuit Theory

February 15, 2013

Moving from the full form of Maxwell’s Equations to the simpler assumptions appropriate for low-frequency lumped element network theory reduces analysis and design of electrical systems to the task of writing loop and/or nodal equations describing an electric circuit. I teach introductory network theory in my Physics classes.

More advanced linear circuit methods such as Laplace transform techniques permit modeling lumped element systems, including the effects of initial conditions, by introducing Laplace transformed circuit elements into a circuit’s basic description.


Here is a basic RLC Resonance circuit:


And here is the frequency response




Maxwell’s Equations

February 6, 2013

The differential form….

\nabla\times{\bf E}=-\displaystyle\frac{\partial{\bf B}}{\partial t}
\nabla\times{\bf B}=\mu_0{\bf J}+\displaystyle\frac{1}{c^2}\frac{\partial{\bf E}}{\partial t}
\nabla\cdot{\bf E} = \displaystyle\frac{\rho}{\epsilon_0}
\nabla\cdot{\bf B} = 0

These are my “favorite” equations. These equations serve as the cornerstone of all of electrical engineering. The result of the work of many people over very many years and published in their modern form by James Clerk Maxwell (1831-1879), these equations played a crucial role in the history of physics. Strictly speaking it may be said that Maxwell’s Equations are “wrong” and have been replaced by the laws of quantum electrodynamics which give an even more fundamental description of the basic interaction of electrical forces with matter. Even so, Maxwell’s Equations are used to describe all classical electrical phenomena commonly appearing in engineering. In the form appearing above the equations are called the vacuum Maxwell Equations which means that all charge in a system is treated as free charge. This formulation is particularly useful in Plasma Physics.