Causal Inference 101

Subjects included: causal information, the origin of algorithms, adaptive behaviors, preset goals, and maximum and minimum entropy methods.

I'll try to explain the entropy-intelligence link in the light of my theory of causality in a simple way. Imagine an empty "brain". I know about bootstrapping, DNA, etc, but let's leave that out for the moment. We need to understand how entropy works before we can argue about bootstrapping. So we have an empty brain with no information in it, more like a substrate, that can do two thinks: store information received from sensors in an autobiographical memory, and remove entropy from it. This is also known as a "host" in the host-guest model of the brain. Note that the term "sensor" as a source of information is very general. A sensor can be just about anything. And I have chosen the brain because the brain is the only known example of an intelligent system.

Information coming from sensors is causal. It consists of ordered (cause, effect) pairs, the cause is the signal that activated the sensor, the effect is the signal the sensor outputs to indicate detection. The elements in the pairs correspond to "neurons", the relation in each pair is a dendritic connection. The pairs chain together whenever they share a common element, for example (a, b) and (b, c) would form a chain of two pairs. More in general, the pairs form an acyclic digraph, a graph that has no loops. A path in the graph formed by chained pairs is known as a trajectory. In a big graph there can be an enormously large number of trajectories. The graph, or rather the collection of distinct elements and the collection of ordered pairs, is known as a causal set.

Causal pairs are executable. Every causal set is an algorithm, a computer program. Every trajectory in the causal set is a possible execution path. If we write a statement such as A = f (B, C), we are saying that we need to have the values of B and C before A can be calculated. That's a causal relation. We are not saying how A is calculated, we are just expressing the causal relationships. In this case, there are two pairs: (B, A) and (C, A), meaning that both B and C must exist in order for A to exist. When working with the host-guest model, this algorithm is the guest.

Now the brain has a causal set obtained by chaining the pairs that have been arriving from the sensors in all possible ways. The causal set is an executable algorithm, and every trajectory in the causal set is an execution path. Of course, every causal set has a collection of "effects" that have no known causes (they have causes, but the causes are outside the brain and are unknown). They are considered given, and are the input for the causal set. Every causal set also has a collection of "causes" with unknown or non-existing effects. These are the outputs.

This works. Nothing else is necessary. You have a whole program, you can enter some input data to initialize all the inputs or some of them and execute the chains that start there and reach some or all of the outputs. This is how algorithms originate. Where else could they have originated from? Recall the brain was assumed to be initially empty, there is no other source for the algorithms other that the causal input itself. I have argued this conclusion in more detail in my JAGI paper and will not repeat that here. The important part is that algorithms, or behaviors, come from causal pairs acquired by sensors from the environment and chained together. They are not "created" in some magical sense or by some secret process in the human brain. They come from observation of the world. More information can be coming from the sensors in a constant flow, and it will be stored and chained into a larger and larger program with a growing number of execution paths. In the real brain, this guest process is conscious.

With that kind of organization, you already have many goals. They are the subsets of outputs that are causally accessible from the inputs. They are all preset by the information coming from outside, and, in this sense, they constitute what we usually call adaptive behavior. They are the information, only better organized. They represent possible behaviors the system in response to external stimuli, and they appear to be adaptive because they respond to the external stimuli. There is behaviors at this point, but there is still no semantics, no meaning. For that,
we need entropy.

Now it gets a little more sophisticated. I hope you know how to code in some language, at least in general terms. So imagine yourself coding the program in the brain as-is. You can't code each trajectory by itself, it would be gigantic. Two trajectories can differ by as little as a few pairs. So you have to reuse portions of trajectories to save space, and connect them with many IF's and GOTO's. What will you get? Spaghetti code. The brain has spaghetti code, the kind of code that is correct and works properly but only its author can understand. This code is highly disorganized, can't be easily modularized, or built-upon, or maintained, or integrated with other products. I don't know very much about brain pathologies, but I can imagine that a person who learns by heart but is unable to draw conclusions or use the "knowledge" intelligently may have this kind of brain. Maybe this situation is related with autism. I just can't say, I am only suggesting to look into it.

What would you do if your boss asked you to make the code "better", more "understandable", but still correct? You would create classes and objects and inheritance relationships and hierarchies of classes and methods that use the objects. This is exactly what entropy does for you. Recall that the host is assumed to be able to extract entropy from the information, and hence make it more less uncertain. When entropy is removed, the code gets refactored, similar elements of information get associated (binding!), and similar functionalities coalesce together and form the classes and the objects and the methods. In the brain, this phenomenon is observed as the formation of neural clicks made of neurons and neural clicks made of other neural cliques, and, at a global level, as the overall partition of the human brain into functionally specialized parts. In the mind, as Joaquin Fuster explains, cognits arise made of elements of information and cognits made of other cognits, all of them interconnected in a complex network that associates the elements at all levels. In the actual brain, this process of extracting entropy and causing the code to self-organize is entirely unconscious. But the result, once the process has completed, is a behavior, an algorithm, which is delivered to our cognition at the time it completes and often causes surprise. This, we acknowledge when we express "I had an idea." The new code is understandable, it has meaning, it has a semantics, all of which has been created by the entropic process, known as causal inference.

The two processes, the conscious one and the unconscious one, run concurrently. The unconscious process is of a themodynamic nature, and the conscious one is algorithmic and constitutes our behaviors. They both happen "in place", the whole substrate, or host, or brain, is running both processes at the same time and at the same place, as neurons adjust their connections to minimum entropy but do not affect the information kept in memory. Information acquired by interaction with the environment is "hot" and highly uncertain. Yet, it is algorithmic in nature and it admits of an input and an output. As heat and energy are removed from the information, algorithmic paths known as trajectories are formed in large numbers that connect the inputs to the outputs. The outputs are the preset goals. They represent all the goals that can be achieved with the currently available information. There are frequently many trajectories leading to the same goals, and to each trajectory there corresponds a value of the action as determined b y the action functional. Some have high action, others least-action. The entropy of the system is very high because of all the uncertainty associated with the multiplicity of trajectories. This is known as the combinatorial explosion. The removal of entropy eliminates the high-action trajectories associated with each goal, and leaves only the least-action trajectories for that goal. The goal is not affected, only the trajectories that lead to it are affected.

Incoming information is "hot", high entropy, but the rest of the brain is already organized, "cold", certain. The incoming information gets organized and integrated locally, very fast, causing very little perturbation for the rest of the memory. All goals are preserved, and more goals keep being created as more an more information is acquired. The result, is a brain that is constantly well organized, has no uncertainty, and constantly knows all the possible behaviors and goals corresponding to the history of information. There is no need for a high-entropy search for goals, as some propose. The high entropy occurs only locally at the point or points where incoming information is acquired.