You are very poor and your only hope is to sell your cat. You put it in a box and get ready to go to the market. But you start thinking. What if the cat dies in the box? I can't sell a dead cat. I will starve. I may die of starvation. But if it is still alive when I get there, what if I don't get enough money to survive? Your brain is effectively considering two different universes. You can't decide between them so you consider both. From your perspective, you are living in two different universes at the same time.

We say that your brain is the system, and the system has only two states: cat dead, or cat alive. We call this situation an uncertainty, because you are not certain which way it is going to be. So you need an answer to your uncertainty. You choose a behavior that is independent of the uncertainty. We say that your behavior is invariant under a transformation from one state to the other. You take the box to the market and open it. The cat is alive.

As a result of your invariant behavior, you have now acquired new information: the cat has survived. Your two universes collapse into one. But you start thinking again. If my cat looked better I would get more money. Again you have two universes, it either looks well or it doesn't. You seek an answer. You need an invariant behavior. You take the cat out of the box and put it in the light. Now you acquire more info: the cat doesn't look well. Again, you are left with only one universe. And so on.

In Physics, we describe a system by means of a set of variables, and we say that each possible combination of values of the variables is a state of the system. The variables can be Boolean, or integer-valued, or anything appropriate. Then, we specify an initial state, and a dynamics for the system. A dynamics is a rule or set of rules that specify how the system transitions from one state to another. And the rules account for the uncertainty. Say the system is in state A and it can transition to state B, or C, or D. We don't know which one it will be. And if the system transitions to B, then it can transition from there to X, or to Y, or to Z, again we don't know which. The dynamics is not causal because the state and the rule to do not add up to a transition. Yet, the response behavior, an algorithm, should be causal so it can be executed. Where does the algorithm come from?

The algorithm comes from what we know, from the information we have. So here is what we do know. We know that states B, C, or D can not exist unless state A has existed before. So we say that A precedes B, C, and D. We also know that X, Y, or Z can exist only if B has existed. And this is the precise point where causal sets come in. We formalize our knowledge by writing:

A≺B, A≺C, A≺D, B≺X, B≺Y, B≺Z.

which, together with the set of states, {A, B, C, X, Y, Z} in this case, is known as a causal set (read the '≺' sign as "precedes"). We can write a computer program for this. It would look as follows:
if(A) then
if(B) then
if(X) then ...
else if(Y) then ...
else if(Z) then ...
else if(C) then ...
else if(D) then ...
which means that if A has existed then either B, C, or D can exist, and in the case where B has existed then either X, Y or Z can exist, and so on until the program stops because there are no more state transitions left. But what have we achieved by writing the program? Nothing. The uncertainty is still there, only now it has been transferred to the data. In order to run the program we have to specify all the uninitialized variables as data, for example we could specify A, B and X, or A and C, etc. In other words we have to specify the exact sequence of state transitions. Guess, rather than specify.

There are many possible sequences of execution that satisfy the constraints in the causal set, and no apparent reason to prefer one over the other. But our brains make a unique solution, every time, when in possession of certain information. For example, if I want to travel from Houston to Dallas I can fly first to San Francisco and from there to Dallas, or I can fly directly from Houston to Dallas. And brains are very consistent. Every person would choose the second alternative, unless they have some other reason to go to San Francisco first, in which case they would have used additional information. So how do our brains make that unique selection? Obviously, we are missing something here. How does the brain do it?


The brain doesn't do "it". It does something else, and "it" follows as a result. The brain must satisfy its never-ending hunger for energy. Information carries energy. Yes, information itself. In March 2012, they have actually measured the amount of heat generated by erasing one bit of information, thus confirming the 50 years old Landauer's prediction, see Berut(2012). When the brain learns something, that is, when it receives information, it supplies energy to its memory so it can store that information. As it stores, it immediately recovers any energy it can from the stored information, and uses this energy to store more information.

And here is some Physics. When energy is extracted from information, then entropy is also extracted. This is the Second Law of Thermodynamics. But entropy is the measure of uncertainty in the information. When energy is removed from the system, the number of state transitions available to the system is also reduced. This is because the system has less energy and higher-energy states are no longer accessible to it. Fewer states mean less uncertainty, and less entropy, which is exactly what the Second Law predicts.

In addition, when the removal of energy blocks the system from accessing high-energy states, the state space shrinks, and the dynamics of the system is compressed into that small space. This compressed space is known as an attractor, and we say that the system has converged to an attractor. As the space state is now so small, the system becomes more stable, and the attractors become observable and easily identifiable. They are the patterns or regularities in information that our brains create all the time. And, as if all that were not enough, the transformation induced in the information by the removal of entropy is behavior-preserving. It is known in Computer Science as refactoring. The result is an algorithm, and the algorithm is causal. In the brain, by the simple act of conserving energy, the entropy and uncertainty are removed from the information, and the result is a unique, invariant behavior.

In Computer Science and Artificial Intelligence, the energy consumption by computers has been the focus of attention for decades. But few seem to have noticed that the brain goes beyond that point, and also reduces the energy consumption of the information itself, not just the machine. Removing energy from information also removes entropy, and causes it to self-organize into invariants. That's what the brain does, all the time, create invariants, or invariant representations of the information it has acquired. This is the answer to Hofstadter's challenge, the 100,000,000 dots of light the your retina that become one single word, "mother." We use the invariant representations for everything, to think, to communicate, to create more invariant representations. Every word I write here is an invariant representation in my mind. A language is an invariant representation. There can be no intelligence without invariant representations.

It generally seems to me, but I can't promise, that little will be left to explain intelligence once the invariant representations are understood. There is an infinite numerable quantity of causal sets, there is and infinite numerable quantity of invariants, and there is a bijective correspondence between each causal set to each invariant representations. What is left? Of course, understanding the implementation details of the brain is another matter.

In Computer programming and Artificial Intelligence, information is processed while leaving all the entropy in it. Alternatively, the information may be passed to humans so as to use the human brain to remove the entropy and create the invariant representations, which are then fed back to computers. This practice perpetuates the familiar man-machine inter-dependency in both fields. We all remember the quest for the perpetual motion machine, which stopped only when the energy-entropy interplay was understood in thermodynamic machines. Are we not pursuing a "perpetual certainty" information machine?