A Rare Individual
Science seeks the basic laws of nature. Mathematics searches for new theorems to build upon the old. Engineering builds systems to solve human needs. The three disciplines are interdependent but distinct. Very rarely does one individual simultaneously make central contributions to all three. Claude Shannon was one such individual, who, more than 70 years ago, single-handedly laid the foundation for an entirely new discipline (Information Theory) that would revolutionize the development of modern communication infrastructure that lies at the heart of the modern information age.

Shannon was born in Gaylord, Michigan, in 1916, the son of a local businessman and a teacher. After graduating from the University of Michigan with degrees in electrical engineering and mathematics, he wrote a master’s thesis at the Massachusetts Institute of Technology (MIT) that applied a mathematical discipline called Boolean algebra to the analysis and synthesis of switching circuits. It was a transformative work, turning circuit design from an art into a science.
Claude Shannon’s Masterpiece
Before Shannon, the problem of communication was framed in terms of how to reconstruct a received signal, distorted by the physical medium like a cable, to the original as accurately as possible. Shannon’s genius was in realizing that the key to communication is uncertainty. After all, if you knew ahead of time what I would say to you in this column, what would be the point of writing it?
In a 1939 letter to his mentor, Vannevar Bush, Shannon outlined the germ of his ideas and worked on it for a decade before solving it in a paper simply titled, “A Mathematical Theory of Communication”. Published in 1948 in the Bell System Technical Journal, it is a one of the greatest scientific papers ever written.
A Beautiful Mind
The heart of Shannon’s theory is a simple but very general model of communication: A transmitter encodes information into a signal, which is corrupted by noise and then decoded by the receiver. Shannon’s model incorporates two key insights: (a) isolating the information and noise sources from the communication system to be designed, and (b) modeling both of these sources probabilistically. He imagined an information source generating a signal and a receiver decoding the message. Messages are sent with a certain probability. The distortions due to noise adds further randomness for the receiver to disentangle. The following diagram from Shannon’s paper captures the essence of his probabilistic model of communication:

Shannon’s understanding of the signal transmission problem shifted it from one that is physical to the one which is abstract, allowing him to model the uncertainty using probability. The novelty of his thinking surprised the communication engineers of the day, to say the least.
Information in Bits
Shannon’s solution to the problem came in three parts. Playing a central role in all three is the concept of an information “bit,” used by Shannon as the basic unit of uncertainty. A shorthand for “binary digit,” a bit could be either a 1 or a 0, and Shannon’s paper is the first to use the word.
Being an applied mathematician, Shannon came up with a formula for the minimum number of bits per second to represent the information, a number he called its entropy rate, H. It is this number which quantifies the uncertainty involved in determining which message the source will generate. The lower the entropy rate, the less the uncertainty, and thus the easier it is to compress the message into something shorter. For example, texting at the rate of 100 English letters per minute means sending 26100 possible messages every minute, each represented by a sequence of 100 letters. One could encode all these possibilities into 470 bits, since 2470 ≈ 26100. If the sequences were equally likely, then Shannon’s formula would say that the entropy rate is indeed 470 bits per minute. In reality, some sequences are much more likely than others, and the entropy rate is much lower, allowing for greater compression.
Second, he provided a formula for the maximum number of bits per second that can be reliably communicated in the face of noise, which he called the system’s capacity, C. This is the maximum rate at which the receiver can resolve the message’s uncertainty, effectively making it the speed limit for communication.
Finally, he showed that reliable communication of the information from the source in the face of noise is possible if and only if H < C. Think of information as water: If the flow rate is less than the capacity of the pipe, then the stream gets through reliably.
One implication of Shannon’s theory is that whatever the nature of the information — be it a Shakespeare sonnet, a recording of Beethoven’s Fifth Symphony or a Kurosawa movie — it is always most efficient to encode it into bits before transmitting it. In a radio system, for example, even though both the initial sound and the electromagnetic signal sent over the air are analog wave forms, Shannon’s theorems imply that it is optimal to first digitize the sound wave into bits, and then map those bits into the electromagnetic wave. This surprising result is a cornerstone of the modern digital information age, where the bit reigns supreme as the universal currency of information.
Shannon’s theory was so abstract and ahead of his time that it looked anything but the stuff of engineering. But such was his genius, the capacity (pun intended) to see the landscape much farther than anyone else had ever done. His theory has now become the cornerstone underlying all modern-day communication systems, on the ground, under the sea, over the air (the internet, WIFI) and extends to the 5G standard currently being rolled out.
This post is adapted from a Quanta Magazine write-up by David Tse who is the Thomas Kailath and Guanghan Xu Professor in the School of Engineering at Stanford University and the inventor of the proportional-fair scheduling algorithm, used in all modern-day cellular systems. His research interests are in information theory, blockchains and machine learning.