What is the difference between the lighthouse and its light? What Claude Shannon nailed down with his mathematical theory of information was that each bit of a message is the answer to a question, a reduction of the total uncertainty of the information system. The anxiety of a writer in front of the blank page is not that it is free of information, but that the blank page contains every possible question that the language could possibly ask, and with each word the writer begins to whittle them down, until the page full of questions is reduced to a string of answers. Each spoken word answers all but one of the possible questions as a no, with that word as a yes. As a sentence evolves, the number of possible word choices dwindles or rises depending on what has come before.

You may protest that language is more nuanced than that, but you are probably thinking of the meaning you take from the message, not the information transmitted. In essence, the Shannon measure of information is the number of yes/no questions you would need to ask in order to accurately reproduce any given message. Shannon called it entropy because it is so similar to the entropy of a physical system, which is the number of microstates that could make up the observed macrostate. But what I mean to discuss here is the essential difference between them: a yes/no question has only two possible outcomes, while physical entropy is fluid and infinitesimally variable. This is the difference between the light and its house, the answer to a single question versus the massive conglomeration of microscopic particles stacked against the motion of the wind and waves. Physical entropy is nuanced like the meaning you take from the words you hear. There is no measure of meaning.

It is important to understand the difference between entropy and information because they define the difference between the physical universe on the one hand and the virtual universe of memory, dreams, language, books and the interconnected network of information technologies on the other hand.

You don’t have to understand the natural logarithm or its base *e* to imagine the difference between physical entropy and the Shannon measure of information. Let us say you drop a cup of water on the floor, physical entropy looks like the probability than any given place on the floor will be wet. Places near the landing site are more likely to be wet than places further away, but the actual distribution is continuously variable, with an infinite number of possible combinations of wet and dry places. Now imagine flipping a coin; the measure of information is the probability of one side or the other being up. This result is also unpredictable, but it only has two possible outcomes (assuming we don’t care about the physical entropy of the coin, i.e., where it ended up.) A flipped coin can make a decision or answer a question, a puddle of water can’t; and yet, whether the coin is heads up or down tells you nothing true about the coin itself, whereas the distribution of the puddle is the infinitesimally true position of the water.

*n*th bit? The puddle’s boundary exists infinitesimally at all scales all the way down to the subatomic probability density distribution, but the coin toss is meaningless if it sits on its edge. “Maybe” transmits no information.

*e*(2.718…) of physical entropy seems quite small, but the shape of the curve they produce is different. In particular, the slope at zero for the curve of the natural logarithm function

*is 1. That is to say, it is perfectly balanced between up and over. The log base 2 has a different slope here and everywhere else. Because the slope determines the evolution of the system from one moment to the next, the difference between the slope of entropy and the slope of the measure of information cannot be emphasised enough.*

*e*, which you may recall as the limit of compound interest.

*e*is a strange and beautiful idea that is not quite a number and shows up almost everywhere). Information uses the log base 2 (binary log) because information in transmission can only be the answer to a string of yes/no questions, or bits with only two possibilities (on/off, yes/no, open/closed), and it can only be transmitted in whole units. Even vision and hearing are based on inputs from individual nerve cells capable of only one yes/no answer to the question, were you triggered? So, while physical entropy can increase infinitesimally through the evolution of the system, information can only increase in bits – whole quantised units. No matter how many bits you use to describe a system, that last bit will miss the infinitesimal variability of true entropy.

The difference between the measure of information and the entropy of matter in space has a physical analog in the difference between photons and massive particles like electrons and protons. A photon, like a yes/no question, is always either emitted or not emitted by the transmitter and it is always either absorbed or not absorbed by the receiver. A photon can only be a binary phenomenon, and its existence is measurable without uncertainty. It is analogous to a unit of information that is either transmitted or not, and upon receipt it ceases to exist. Electrons don’t carry around a bag of photons that they fill and empty. The photon only exists in transit, so it cannot be captured or held for observation.

On the other hand, a massive particle has infinitesimally variable freedom. Its momentum and location can be approximated through the application of constraints, but in order to know its location you must apply constraints that make its momentum uncertain, and vice-versa. Its momentum and location are real, but Heisenberg uncertainty means that you can never know both of them simultaneously. So you can capture and observe certain aspects of a massive particle, but you cannot know for certain what you’ve captured without changing it into something else.

We usually deal with such large agglomerations of massive particles that the average location and momentum provides a really good “picture” of the group, whether it’s a bowling ball or a cloud of steam, and we tend to think of this average as both ontologically real and epistemologically knowable at the same time. But there is a difference between this mashup of averages of location, momentum, and visible light that makes up our understanding of the bowling ball on the one hand, and the two distinct concepts of physical entropy (momentum and location), and the Shannon measure of information in transmission (electromagnetism).

It is tempting to ask why information entropy doesn’t just use *e* as well for consistency, but it can’t. An infinitesimal can’t exist in information. The smallest unit of information that can be transmitted is one bit or one photon. No fractions. No incremental adjustments. Anything less than one bit is simply unclear. Information has no slope at zero and no integration from zero to one. But the difference in physical and informational entropic extends through to infinity. The slope is always different. What this means is that no information system can maintain an accurate model of physical entropy for more than a moment without corrections. We can communicate the concept of entropy with information, but not model it. This is why nobody believed the climate models until the ice actually started melting off the poles, and why computer animation is clunky unless it’s based on motion-capture from real people. Energy can equally take the form of momentum or volume, and entropy is not biased one way or the other – which is why fluid pressure drops when the particles accelerate (over an airplane wing, for example), the acceleration energy takes energy away from its volume, thereby reducing its pressure. Moreover, while *e* can be approximated to a large number of digits, it can never be calculated precisely. So, you could use all of the power in a supercomputer calculating *e* for a few parts of your physical model and still not have it right.

*e*into the information systems and chug away, but you have to remember that

*e*is an irrational number. For a continuously evolving system, it is not enough to insert the symbol

*e*and then solve for it at the end of the process. Each evolution, that is, every part at every moment, needs

*e*to be fully resolved for the calculation of its entropy. But

*e*has an infinite number of digits. You can cut corners and only resolve

*e*down to, say, ten digits, but this introduces an error, which grows in every moment. No matter how fast your processors and how big your memory, you cannot calculate the evolution of reality. This is why it is so important to listen to people with whom you disagree. Even if they are curmudgeons.

The point is not that science is at a dead end, but that science is never finished. No matter how objective and perfect the axioms and algorithms, the information that can be transmitted deviates continuously from reality. Every information system, whether artificial or human, is in a constant process of going mad. This is why the debacle of Microsoft’s Tay AI chatbot was so predictable, and why social media hasn’t engendered an informed public in the middle east or in the United States. Ordinary madness requires continuous correction, while pathological madness can only be treated as a natural affliction of reality. The best laid plans of mice and men have a different shape and evolution from the reality they confront.