Source Coding Theorem

One way to understand the source coding theorem [1, 2] is to first consider a general discrete source with an output alphabet of {J} symbols. It would require {\log_2 J} bits to transmit a given symbol. Similarly, for a block of n sequential ordered elements of the alphabet there would be {J^n} possible messages that can be transmitted in one block. It would thus require {\log_2 J^n= n\log_2 J} bits to send a message of {n} symbols.

Next specialize to the case of a discrete memoryless source also with an output alphabet of {J} symbols. Let the {i^{th}} member of the alphabet be denoted {x_i} and let the probability of occurrence of {x_i} be denoted {p_i}. Now the block of {n} symbols where {n} is very large is considered. Let the expected number of occurrences of {x_i} be denoted {n_i}. We have that for {n} very large

\displaystyle n_i = p_in\qquad\qquad\qquad\qquad

That is to say for {n} very large the number of occurrences of each of the {x_i} is known and the number of likely messages is not {J^n} but instead the number of distinct permutations of a block of symbols with

\displaystyle p_0n \mbox{ occurrences of } x_0,

\displaystyle p_1n \mbox{ occurrences of } x_1,

\displaystyle p_2n \mbox{ occurrences of } x_2,

\displaystyle ~~~~~~~~~\vdots

\displaystyle p_{J-1}n \mbox{ occurrences of } x_{J-1}.

So the number of possible messages that can be sent in one block of {n} symbols is

\displaystyle \mbox{Num. of messages} = \frac{n!}{\left(p_0n\right)!\left(p_1n\right)!\left(p_2n\right)!\ldots\left(p_{J-1}n\right)!}    .\ \ \ \ \

Therefore, the number of bits required to send these messages or, equivalently, the number of bits needed to indicate to a receiver which of these messages was transmitted would be

\displaystyle{}N_{\rm bits}=\log_2\left[\frac{n!}{\left(p_0n\right)!\left(p_1n\right)!\left(p_2n\right)!\ldots\left(p_{J-1}n\right)!}\right]


\displaystyle{}N_{\rm bits}\ln 2=\ln n!-\sum_{j=0}^{J-1}\ln\left[\left(p_jn\right)!\right]. \ \ \ \ \

Using the approximation {\ln n! =n\ln n - n} [3] shows

\displaystyle{}N_{\rm bits}\ln 2=n\ln n-n-\sum_{j=0}^{J-1}\left[p_j n\ln\left(p_j n\right)-p_j n\right], \ \ \ \ \

\displaystyle=n\ln n-n-n\sum_{j=0}^{J-1}\left[p_j\ln p_j + p_j\ln n-p_j\right], \ \ \ \ \

\displaystyle=-n\sum_{j=0}^{J-1}p_j\ln p_j.\ \ \ \ \


\displaystyle{}N_{\rm bits}=-n\sum_{j=0}^{J-1}p_j\log_2 p_j=nH\left({\cal X}\right), \ \ \ \ \

where {H\left({\cal X}\right)} is the entropy of the alphabet.

Of course more bits can always be used to code all of these different messages but this presents the minimum needed when {n} is very large. The minimum required average number of bits per transmitted symbol is denoted {\bar L_{\rm min}} and thus satisfies

\displaystyle{}\bar L_{\rm min}=\frac{N_{\rm bits}}{n}=H\left({\cal X}\right). \ \ \ \ \

This result is the source coding theorem [1, 2]. Of course we have not shown that uniquely decodable codes exist that operate on one symbol at a time and can obtain this limit. However, we have shown that this limit exists.

I have provided [4] a more detailed write-up introducing information theory available by clicking below.


[1] S. Haykin, Digital Communications, Wiley, N.Y. (1988), Chapter 2.

[2] C.E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal (1948).

[3] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, Dover, N.Y. (1965), pp. 255 – 257.

[4] H.L. Rappaport, Notes on Information Theory I  and the Source Coding Theorem, 7G Communications, 7GCTN01 (2014); infoI



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: