Counterfactual Simulation: Presumed Similar Until Proven Different

Part Of: Sociality sequence
Content Summary: 1400 words, 14 min read

False Belief Blindness

Consider the Sally-Anne test:

A child is in a room, watching Sally and Anne who are also in the room. The room has two objects in it: a basket and a box.

Sally has a marble. Sally puts the marble in the basket and leaves the room. While she is gone, Anne moves the marble to the box.

Anne comes back, wanting to play with her marble. Where will she look for it?

To answer the question correctly, it is not enough to model Sally and Anne as having desires and beliefs (to deploy the intentional stance). The child must also be able to differentiate her own knowledge from the knowledge of others (the child is correct, but Sally is wrong). This is an instance of cognitive decoupling: building firewalls around the beliefs/desires of individual agents.

It turns out that 3yo children get the answer wrong, but 5yo children (except autistic ones) get it right. What happens at age four?

Why Belief Inference Is Blind …

At the end of the test, the child believes that the marble is in the box. Sally believes that the marble is in the basket. Their respective minds might look something like this:

ToM- Sally-Anne v1

In Awakening To A Social World, we learned that, at twelve months, children begin to think about other people as having beliefs and goals:

ToM- Sally-Anne v2

If the above picture is how your brain works, it would be puzzling to explain how a child would ever be tempted to conclude that Sally thinks the marble is in the box.

But this is not how your brain encodes second-order beliefs. Here’s what actually happens:

ToM- Sally-Anne v3

Crucially, relational second-order beliefs (“Sally thinks that”) point towards first-order beliefs (“Marble’s in basket”) which live in your world model. This mental library comes equipped with a librarian, who flags the incompatibility between “Marble’s in box” and “Marble’s in basket”, and removes the latter.

Architecturally, the reason why three year olds suffer from false belief blindness is that all beliefs funnel through one world model. There are simply no separate memory spaces to evaluate the world model of other people. In order to understand the beliefs of other people, they must be compatible with facts known by oneself.

… And Recognizing Falsehood Is No Cure

In Gullible By Default we discussed how negating beliefs is optional, effortful, and prone to failure. So you might think that the three year old child hasn’t yet developed the ability to negate Sally’s belief. But you’d be wrong. if you look carefully at the mechanics of negation, you will realize that negation cannot help.

Negating claims is implemented by adding an “It is false that” tag in front of the belief in question. We can negate Sally’s belief in two different ways:

ToM- Negation Cannot Model False Beliefs

Both negations fail. It is simply not true that Sally recognizes the ball isn’t in the basket. Likewise, we cannot say that Sally does not think that the ball is in the basket. Even exotic combinations (e.g., double negatives) are of no use.

But shouldn’t recognizing falsehood in one’s self enable us to recognize it in others?

No. It may help to notice that lie detection immediately corrects false beliefs (by introducing an “it is false than” mental prefix). This repair is essential to protect the mental ecosystem from contamination. Put simply, negation evolved as a protection against deception; it is simply not equipped to recognize honest mistakes.

The Birth Of Creativity

Quite independently of the evolution of lie detectors, the hominid line has also acquired a different kind of ability: the ability for pretense. Did you play the “floor is lava” game as a child? What is going on in the mind of a child when they pretend that couches, etc are the only refuge from a sea of molten rock?

You might be tempted to say that there exists a “floor is lava” belief in your mental library during such games. Or that the falsehood-detector is exploring the possibility of appended “It is false that…” to the traditional belief that “the floor is carpet”. But something more sophisticated is going on. As Leslie puts it:

If I jump up suddenly because I mistakenly think I see a spider on the table, I act as if a spider were there. But I certainly do not pretend a spider is there.

Instead of replacing belief, the child is alternating between competing beliefs. More specifically, the child is building a tiny little scaffold, which hovers over their actual belief that “the floor is carpet” and simulates a world in which that belief is replaced. We may call this counterfactual simulation.

ToM- Sally-Anne v4 Counterfactual Maps

To operate effectively, your counterfactual simulation must:

  • Retain the original belief and its relationships to the rest of your memory. Damaging to any of these forms of knowledge are irreversible.
  • Construct maps between the original belief and the counterfactual. It is not enough to imagine “Floor is Lava”, you must know which belief it overwrites.
  • Distance the prediction machine from the sensorimotor river. The floor’s perceptual signature doesn’t evince visceral fear, for example.

Why has a counterfactual simulator evolved in the hominid line?

Pretense originates in play, but is far more significant than that. With the ability to simulate different worlds, our minds are able to “try on” new beliefs as if they were hats. If we locate a belief that explains the world better than our world model, we upgrade our world model. Counterfactuals allow our prediction machines to upgrade themselves. They are the algorithmic bedrock of creativity.

How Counterfactuals Restored Our Sight …

A timeline of mind-relevant developmental milestones:

  • At 12 months, the intentional stance emerges
  • At 20 months, pretending behavior emerges.
  • At 48 months, false belief blindness is overcome.

Despite the large time gap between pretense and the blindness reversal, I claim that pretense is the cure. What gives?

The answer lies in the fallback mechanism for representing false beliefs. Recall that the 13 month old’s mental librarian rejects “Marble’s In Basket” as incompatible, and as a fallback, re-routes “Sally thinks that” towards the true belief “Marble’s In Box”. When the counterfactual simulator comes online at twenty months, it isn’t yet involved in this failure mode.

It is only slowly that the child’s brain discovers that these two technologies can be productively combined. Passing the Sally-Anne test requires a novel modification to the error processing algorithm:

ToM- Sally-Anne v5 Counterfactual Redemption

Not only are false beliefs encoded counterfactually, but the novel data are stored in the relationship model for later reuse. This is how we become aware of the fallibility of our peers.

… At The Cost Of Self-Anchoring

In Epistemic Topography, I said this:

The curse of knowledge expects short inferential distances. Why does this bias (not another) live in our brains?

As we have seen, estimating [epistemic] location is expensive. So the brain takes a shortcut: it uses a location it already knows about (its own) and employs differences between the Self and the Other to estimate distance. Call this self-anchoring. But the brain isn’t aware of all differences, only those it observes. Hence the process of “pushing out” one’s estimation of Other Locations typically doesn’t go far enough… the birthplace of the curse.

We now have a mechanical explanation for this mental shortcut. The more differences between our beliefs and another person, the more data we must encoded counterfactually. But counterfactuals are not prediction machines in their own right; they only facilitate tinkering with our own machinery. Other people are thus presumed similar until proven different.

Takeaways

Executive Summary:

  • Three year old children cannot conceive of other people being wrong.  This is far after they become mind-aware, even after they become able to recognize deceit. What gives?
  • First, children are blind to false beliefs because all beliefs (mine and yours) are based in the same location: the mental library, or world model.
  • Second, falsehood detection ultimately evolved as a deception-detector; why should we expect it to also function as a wrongness-detector?
  • The ability to pretend (e.g., “This Floor Is Lava”) allows our minds to test out new beliefs, rejecting the failures and integrating the successes.
  • The ability to simulate counterfactuals opens up a new pathway to encode false beliefs.
  • However, this pathway doesn’t let us imagine other people independently: other minds are always self-anchored; that is, imaged as deviations from your own mind.

References

  • [Leslie 1987] Pretense and Representation: The Origins of “Theory of Mind” [link]

The Logic of Mindreading

Part Of: Sociality sequence
Followup To: Agent Detection: Life Recognizing Itself
Content Summary: 1500 words, 15 min read

David Hume once observed:

There is an universal tendency among mankind to conceive all beings like themselves, and to transfer to every object those qualities of which they are intimately conscious. We find faces in the moon, armies in the clouds; and, by a natural propensity, if not corrected by experience and reflection, ascribe malice or goodwill to every thing that hurts or pleases us … trees, mountains and streams are personified, and the inanimate parts of nature acquire sentiment and passion.

How did we acquire the ability to discover minds in the world? Let’s find out.

Equifinality: Awakening To Goals

Obviously enough, living things obey the laws of physics. But living things (which we called agents) also have properties unique to them: characteristic appearances (e.g., faces) and behaviors (e.g., self-propelled movement). In Life Recognizing Itself we saw how this allows the brain to build better prediction machines around these agents. We saw that differentiating agents does not require representing other minds. This fact allows us to appreciate the gradual maturation of the Agency Detection system as we travel across the Tree of Life.

Once an organism is detected, the Agent Classifier strains additional information out of its perceptual signature. But there are many other improvements we can make to our prediction machines. Consider, for example, the perspective of an infant who every night is picked up and placed in her crib.

On some nights she may be crawling around the kitchen, in others the family room, in others sitting in her parent’s lap on the couch. Despite these very diverse beginnings, the outcome of the putting-to-bed process is always the same. From the perspective of the neurons in her eyes, very different beginning states always result in the same end state. This property is known as equifinality, and it is not unique to nursery rooms: it is ubiquitous among living things (another example: the behavior of water buffalo around watering holes).

How might prediction machines anticipate equifinality? An efficient explanation of equifinality comes from ascribing goals to an agent.

Dominance Hierarchies: Awakening To Belief

Minds capable of computing situation-agnostic equifinality already have an advantage: perhaps the lion just ate and is not hungry enough to give chase, but the gazelle is just as well modeling the lion as universally hungry. But there is room to inject a little contingency. Return back to our nursery example: the infant knows that putting-to-bed desire is only activated at night. How best to anticipate contingent desire? An efficient explanation for equifinality contingency is ascribing beliefs to an agent.

In fact, there are myriad benefits towards possessing belief inference systems besides equifinal contingency.  Another important reward gradient stems from most important social structure of the animal kingdom: the dominance hierarchy. Wolf packs, for example, feature an alpha male, who is given special feeding & reproductive privileges. In species like the baboon, this hierarchy becomes much more detailed: every individual male knows their who is above & who is below their place.

Dominance hierarchies can arguably exist without its members having theories of mind. But considering that agility in climbing the hierarchy is under strong selective pressure, and that effective navigation requires tremendous psychological prowess. For example, in his book A Primate’s Memoir, Robert Sapolsky recounts tales of his baboons forming alliances in their attempts to dethrone the sitting alpha. The ability to model the beliefs and desires of one’s alliance partner is surely a boon; hence discovering mind can be viewed as a selective consequence of the dominance hierarchy.

Intentional Stance

This tendency to ascribe beliefs and desires onto other agents together known as the intentional stance. Let us name the module responsible for this ability the Agent Mentalizer. The intentional stance appears in children between 6 and 12 months (Gergely et al, 1994).

Intentional Stance- Information Processing

As we might expect from any algorithm in the Agency Detection system, we might expect the intentional stance to be quite vulnerable to false positives. And that is exactly what we find. For an extreme demonstration of this, consider the following video (taken from Heider & Simmel. (1944)):

While the Agent Mentalizer was designed to understand the minds of other animals, it had no trouble ascribing beliefs and goals to two dimensional shapes. This is roughly analogous to your email provider accepting a tennis ball as a login password.

Impression Management Via Secondary Models

In your brain, you have a set of beliefs about the world. These beliefs may be stored in many different memory systems, but all of them together improve the power of prediction that you wield over your environment. Your knowledge, your web of belief, I call a World Model. For every significant individual in your life, you have a collection of beliefs about them, provided by your relationship modeling system. Some of these beliefs are non-mental, obtained from systems like the Agent Classifier. Call these beliefs collectively your primary model. For every significant individual in your life, you have a primary model for them; your World Model contains many primary models.

However, the Agent Mentalizer evolved to infer mental states (beliefs, goals) of other individuals as well. But simulating mental states is not like simulating objects – mental states are about objects. That is, we require a new type of model – a secondary model – to simulate the primary model of another person.

Confused? An example should help.

Imagine a three year old child, Bonnie, and her best friend Clyde. Since she was very young, Bonnie has been accumulating shared experiences with Clyde, and is now able to recognize him by appearance. These memories and knowledge of Clyde are stored in a primary model.

At the twelve month mark, Bonnie acquired the ability to simulate beliefs/goals of other people. This awakening has greatly improved her understanding of Clyde. She began noticing when Clyde was not in the mood to play, and his opinions of various toys. She even became able to crudely simulate Clyde’s response to situation X, even if Clyde hadn’t encountered X in real life.

During this time, of course, Clyde went through the very same maturation process. We can model their co-understanding as follows:

Intentional Stance- Modeling Other Minds

The above graphic underscores the “egotistical” subset of secondary models: simulation of the impression they are making on one another. The intentional stance is the birthplace of impression management.

Against Ternary Models

If secondary models simulate primary models, can ternary models simulate secondary models? Well, let’s not close ourselves to the possibility. Here’s what such a thing looks like:

Intentional Stance- Difficulties With Ternary Modeling v1

So… ternary models are hard to understand! Let’s simplify:

  • A’s model of B = How B Appears To A
  • B’s model of A’s model of B = How B thinks B appears to A
Intentional Stance- Difficulties With Ternary Modeling v2

Make sure the construction logic of the above image is clear: you should have no trouble constructing a quaternary graphic, quinary graphic, etc. Doesn’t the raw ability to reason about such higher-order logics mean that your brain is capable of infinitely-nested meta-models?

A useful analogy here is the natural numbers (0, 1, 2, …). How can your brain hope to count arbitrarily large numbers of things? After all, folkmathematics is fueled with biochemistry; the number of representations it can store is finite. So… how many numbers can it count?

The correct answer is four.  That is, your subitizing module renders your ability to count up to four is almost instantaneous; for larger numbers, response times rise dramatically, with an extra 250-350 ms added for each additional item beyond about four items. We see a similar time difference in recursive reasoning: most people are easily able to simulate of other people (primary modeling) and conduct impression management (secondary modeling). But imagining the impression-management of other people takes conscious effort; and thus cannot be part of your default Theory Of Mind machinery.

With these insights in mind, we see that ternary models out not be admitted into our mental architecture simply because they are conceivable.  Such a solution would not be parsimonous since these complex behaviors can be built from these atomic components. In fact, in the simplification above, you can even see hints of your two-level brain “translating” ternary concepts into a more direct language that it better understands. 

Takeaways

Here are the ideas I want you to walk away from this post:

  • Agents are able to steer different situations towards one outcome. Seeing a world of desire, a world of goals, is how children explain equifinality.
  • Social animals typically compete for resources in the shadow of a dominance hierarchy. Modeling the beliefs of their peers about them improves social acumen, and with it, fitness.
  • Ascribing beliefs and desires to other agents is known as the intentional stance. It appears very early in humans, between six and twelve months.
  • We can formalize the above notions in representation theory. Thinking about other people reside in primary models, thinking about other people’s thoughts go in secondary models.
  • Ternary models are unlikely to exist for the same reason that computers don’t feel inclined to count to infinity. Our brains can get there by recursively invoking more simple components.

Relevant Resources

  • Gergely et al (1994) Taking the intentional stance at 12 months of age
  • Heider & Simmel. (1944) An experimental study of apparent behavior
  • Trick, L.M., & Pylyshyn, Z.W. (1994). Why are small and large numbers enumerated differently?

Metaphor Is Narrative

The Nature Of Metaphor

Just for fun, let me open today’s discussion with a few aphorisms:

  1. It feels more natural to say “her smile is warm” than “her body warmth is a smile”. Metaphor is asymmetrical. 
  2. Abstraction is wedded to metaphor.
  3. Inference flows from the concrete to the abstract. Metaphor relocates inference.
  4. The flow of inference is constrained. When we say “that lawyer is a shark” our brains decide which of our shark inferences are relevant.
  5. Idiom is a form of metaphor. Like idiom, metaphor can go stale.
  6. Metaphor relocates affect, even after the stream of inference dries up.
  7. Metaphor imbues communication with affective flair or style.
  8. Metaphors are hierarchical, with complex themes (e.g., “a Purposeful Life is a Journey”) made of smaller metaphors.
  9. In my language, I say that metaphor is narrative. That is, weaving metaphorical hierarchies is narrative paint.
  10. Metaphor is not yet differentiated sufficiently to compose well with the rest of cognitive science.

Primary Sensorimotor Metaphor

Okay, time to delve a little deeper. Consider the following metaphors.

  1. Affection Is Warmth (“her smile is warm”)
  2. Important is Big (“tomorrow is a big day”)
  3. Happy Is Up (“I feel uplifted”)
  4. Intimacy Is Closeness (“we’re beginning to drift apart”)
  5. Bad Is Stinky (“this artist stinks”)
  6. Difficulties Are Burdens (“finals are weighing me down”)
  7. More Is Up (“prices are high”)
  8. Categories Are Containers (“do tomatoes go in the fruit category?)
  9. Similarity Is Closeness (“these colors are close”)
  10. Linear Scales Are Paths (“your IQ goes well beyond mine”)
  11. Organization Is Physical Structure (“how do the pieces of this theory fit together”)
  12. Help Is Support (“support your local charity”)
  13. Time Is Motion (“time flies”)
  14. States Are Locations (“close to having an anxiety attack”)
  15. Change Is Motion (“car has gone from bad to worse”)
  16. Actions Are Self-Propelled Motions (“my project is moving along”)
  17. Purposes Are Destinations (“I’m not where I wanted to be”)
  18. Purposes Are Desired Objects (“grab the opportunity”)
  19. Causes Are Physical Forces (“pushed the bill through Congress”)
  20. Relationships Are Enclosures (“this feels confining”)
  21. Control Is Up (“I’m on top of it”)
  22. Knowing Is Seeing (“see what you mean”)
  23. Understanding Is Grasping (“gotten my mind around imaginary numbers”)
  24. Seeing Is Touching (“pick my face out of the crowd”)

What similarities between these metaphors do you see? [Footnote 1] Well, these questions are all unidirectional, and explain abstract concepts by appealing to more down-to-earth domains. What do I mean by down-to-earth? Well, all of the above examples appeal to perceptual or motor phenomena!

In terms of the human brain, perceptual and motor (“sensorimotor”) systems tend to reside in the cortical homunculus. In terms of the human memory hierarchy, these types of concepts tend to arise in procedural memory.

Primary Metaphor In Other Memory Systems

Now, human memory contains more than just procedural memory. We can use our understanding of other memory systems to predict other kinds of primary metaphor.

  • “Lawyers are sharks” might be better explained by appealing to a culturally-ubiquitous item of semantic memory
  • The Biblical metaphor “Sinners are tax collectors” would plausibly draw from a culturally-ubiquitous item of episodic memory.
  • Since autobiographical memories are not culturally ubiquitous, we might predict a more personal taste to this type of metaphor.

Metaphor Composition Is Narrative Paint

Human beings conceptualize abstract objects by bringing many primary metaphors into a complex whole. Let me pull an example from Lakoff & Johnson: the concept of Time [Footnote 2].

The Time Orientation Metaphor looks like this:

  • The Location Of The Observer → The Present
  • The Space In Front Of The Observer → The Future
  • The Space Behind The Observer → The Past

Examples: That’s all behind us now. We’re looking forward to your presentation. He has a great future in front of him.

The Moving Time Metaphor interprets times to be objects and the passage of time to be the motion of objects past the observer.  This metaphor really finds its legs when composed with the Time Orientation metaphor. The Time Orientation + Moving Time complex metaphor, then, looks like this:

  • The Location Of The Observer → The Present
  • The Space In Front Of The Observer → The Future
  • The Space Behind The Observer → The Past
  • Objects → Times
  • Motion Of Objects Past The Observer → The “Passage” Of Time

Examples: The time will come when there are no more typewriters. The time has long since gone when you could mail a letter for three cents. The time for action has arrived. Thanksgiving is coming up on us. Time is flying by. Let’s meet the future head-on.

But abstractions like time are typically underwritten by more than one can of narrative paint. In this case, the Moving Observer Metaphor alternatively imagines location on the observer’s path as times, and the motion of the observer as the passage of time. Here is the Time Orientation + Moving Observer complex metaphor, in full detail:

  • The Location Of The Observer → The Present
  • The Space In Front Of The Observer → The Future
  • The Space Behind The Observer → The Past
  • Locations On Path Observer’s Path   → Times
  • Motion Of The Observer → The “Passage” Of Time
  • Distance Moved By Observer → The Amount Of Time “Passed”

Examples: There’s going to be trouble down the road. What will be the length of his visit? Let’s spread the conference over two weeks. We passed the deadline. We’re halfway through September. His visit to Russia extended over many years.

Takeaways

Today, I gave you examples of “primary” metaphor, which in this case were grounded in the human perceptual/motor systems. Abstract concepts are made by gluing primary metaphors together like Legos. I also left you with several aphorisms, including:

  • Metaphor relocates inference.
  • Metaphor imbues communication with affective flair or style.
  • Weaving metaphorical hierarchies is narrative paint.

Footnotes

  1. This question (“do you see”) nicely illustrates primary sensorimotor metaphor #22.
  2. Source: http://www.amazon.com/Philosophy-Flesh-Embodied-Challenge-Western/dp/0465056741. For reasons outside the scope of this post, I cannot endorse this text, but I did find its presentation of complex metaphor useful.

Nash Equilibria

Followup To: An Introduction To Prisoner’s Dilemma
Part Of: Algorithmic Game Theory sequence
Content Summary: 500 words, 5 min read

Consider the following game:

Both spouses prefer the company of one another. In fact, they spent the evening together, they incur zero cost. However, if faced with a choice of waiting in an empty house vs. earning some overtime, they would prefer the latter twice as much.

We encode this game’s strategy-space as follows:

Nash Equilibria- Spouse Game

Recall our previous discussion of Pareto optimality and strategic dominance. There are myriad ways to think about games, why isolate those two properties, in particular? One reason to invent names is to construct a universal toolkit: non-trivial properties that exist in all games, and amenable to our analyses.

Pareto optimality makes an appearance in the above game (H, H). But strategic dominance does not. Take a moment to convince yourself this is true.

Since strategic dominance is too strong to be a universal property, we might relax it. What happens when we encode regret? That is, what does the Spousal Game look like after we consider cases when a player wishes she had made a different choice?

Nash Equilibria- Regret

Arrows represent regret.

  • In bottom-left cell, Spouse A wishes she had worked (purple arrow right) but Spouse B wishes he had gone home (orange arrow up).
  • In top-right cell, we see an entirely symmetric expression of mutual regret.
  • In the remaining cells, neither spouse can do better by individually changing their strategy.

Let us view Prisoner’s Dilemma from the same lens.

Nash Equilibria- PD Regret

These arrows have a peculiar pattern. Why? Because of strategic dominance!

We can explain strategic dominance in terms of regret. That is, if a player’s regrets are all in the same direction, then that player is subject to strategic dominance.

Does regret belong in our universal toolbox? No. Regret by itself is rather uninteresting: every game with non-trivial utilities comprises myriad regrets.

We need a stronger property. How about games containing outcomes with no regret? Call such outcomes Nash equilibrium.

Does every game have Nash equilibria?

  • Prisoner’s Dilemma has one: (D, D)
  • Spousal Game has two: (H, H) and (W, W)

Perhaps no game fail to have this peculiar creature..

Conjecture 1: Every game with a finite number of players and strategies has at least one Nash equilibrium.

Time to search for counter-examples! Consider the game Rock-Paper-Scissors. Here’s how it looks to a game theorist:

Nash Equilibrium- Rock-Paper-Scissors

Crucially, every node has at least one arrow leaving it. Rock/Paper/Scissors has no equilibrium! We have thus disproven Conjecture 1. What now?

In addition to the absence of Nash equilibria, this game is interesting in another sense. It turns out that having a deterministic strategy in Rock/Paper/Scissors is a bad idea.  In fact, machines can reliable beat people at Rock/Paper/Scissors by exploiting patterns in human gameplay.

What happens when we expand our notion of game to incorporate non-deterministic strategies. The best mixed strategy a player can adopt is [⅓ ⅓ ⅓]; that is where each choice is randomly selected with probability 0.33.

While hard to visualize, it is easy to intuitively grasp the existence of a new equilibrium. If each player adopts [⅓ ⅓ ⅓], neither will experience regret! Thus we can safely repair our Conjecture:

Conjecture 2. Every game with a finite number of players and strategies has at least one Nash equilibrium if mixed strategies are allowed.

It turns out that Conjecture 2 is entirely correct. In fact, its correctness is one of the most significant results in all of game theory.

Overfitting: Failure Mode of Meat Science

Followup To: [Data Partitioning: How To Repair Explanation]

If I have seen a little further, it is by standing on the shoulders of giants
– Isaac Newton

Table Of Contents

  • Context
  • The Optimization Level Of Science
  • The Fruits Of Science
  • Translation Proxy Lag
  • Cheap Explanations
  • Failure Mode of Meat Science
  • Takeaways

Context

Last time, we learned two things:

  1. Explainers who solely optimize against prediction error are in a state of sin.
  2. Data partitions immunize abductive processes against overfitting.

Today, we will apply these results to human scientific processes, or to what I will affectionately call meat science.

The Optimization Level Of Science

Nietzsche captures the reputation of science well:

Science is flourishing today and her good conscience is written all over her face, while the level to which all modern philosophy has gradually sunk… philosophy today, invites mistrust and displeasure, if not mockery and pity. It is reduced to “theory of knowledge”… how could such a philosophy dominate? … The scope and the tower-building of the sciences has grown to be enormous, and with this also the probability that the philosopher grows weary while still learning….

Attempts to ground such sentiments in something rigorous exceeds the scope of this post. Today, we simply accept that science is particularly epistemically productive. But let us move beyond the cheerleading, and ask ourselves why this is so.

If I were to hand over a map of our specie’s cognitive architecture to an alien species, I would expect them to predict demagogues much more easily than the discovery of the Higg’s Boson. The simple truth is that our minds are flawed: we are born with clumsy inference machinery. How then is epistemic productivity possible?

The success of science has been said to derive from the scientific method:

Overfitting- Scientific Method

But does the scientific method lend itself to the debiasing of the human animal? I argue it does not. Tribalism in scientific communities, for example, doesn’t seem particularly muted compared to other realms of human experience. Further, in his classic text The Structure of Scientific Revolutions, Thomas Kuhn showed that scientific revolutions emerge from strong, extra-rational motives. In his view, the nature of paradigm shifts is a bit like mystical religious experience: deeply personal, and difficult to verbalize.

I like to imagine science as a socio-historical process. Individuals and even sub-communities within its disciplines may fail to track what is Really There, but communities on the whole tend to move towards this direction. As Max Planck once observed:

A scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it.

The question of how the scientific method facilitates socio-historical truth-tracking is, I believe, unsolved. But all I must do today, is flag the optimization level of science. If science organically debiases our species at the socio-historical level, there is much room for improvement. If cognitive science can forge new debiasing weapons, we will become increasingly able to transcend ourselves, able to move faster than science.

The Fruits Of Science

Consider again the following:

Overfitting- Newton Seen Further Quote

This meme is sticky, and signals humility… but it also frames a crisis.

The shoulders keep growing taller.

Because of the accretive nature of science, our collective knowledge far outpaces our cognitive abilities. Even if the controversial Flynn effect is true and our collective IQ really is improving over time, the size of our databases would still outstrip the reach of our cognitive cone.

Various technologies have been invented that assuage this crisis. Curriculum compression is one of the older ones. That which was originally available as an anthology of journal articles is compressed into a review article, review articles compressed to graduate courses, graduate courses polished into undergraduate presentations. Consider how Newton would marvel at a present-day course in differential calculus: 300 years of mathematical research, successfully installed into an undergraduate’s semantic memory in a matter of weeks.

Curriculum compression is but one answer to our exploding knowledge base. Other implicit reactions include:

  1. the narrowing of research trajectories
  2. the fractionating of disciplines into the Ivory Achipellago
  3. research crowdsourcing (the de-popularization of the Lone Intellectual Warrior model)

In the years ahead, society seems geared to add two more solutions to our “bag of tricks”:

  1. the cognitive reform of education technology
  2. the mechanization of science

And yet, the shoulders keep growing taller…

Translation Proxy Lag

In Knowledge, an Empirical Sketch, I introduced measurement under the title of translation proxies. Why the strange name? To underscore the nature of measurement: forcibly relocating sensory phenomena from its “natural habitat” towards a signature that our bodies are equipped to sense. In this light, a translation proxy can be viewed as a kind of faucet, bringing novel forms of physical reality into the human umwelt.

But consider physics, where the “low-hanging fruit” translation proxies have already been built. You don’t see many physicists clamoring for new hand-held telescopes. Instead, you see them excited over ultra-high precision telescope mirrors, over particle colliders producing energies in the trillions of electronvolts. Such elaborate proxy technologies simply do not flow as quickly from the furnace of invention. Call this phenomenon data intake stagnation.

Not only are our proxy innovations becoming more infrequent, but they are also severely outpaced by our theoreticians. In physics, M theory (the generalization of string theory) posits entities at the 10^-34 m scale, but our current microscopes can only interrogate resolutions around the 10^-15 m scale. In neuroscience, connectome research programs seek to graph the nervous system at the neuronal level (10^-6 m), but most imaging technologies only support the millimeter (10^-3) range. Call this phenomenon translation proxy lag.

It will be decades, perhaps centuries, before our measurement technologies catch up.

Cheap Explanations

Let us bookmark translation proxy lag, and consider a different sort of problem.

Giant shoulders are not merely growing taller. They also render impotent evidence once vital to theoreticians. Let me appeal to a much-cited page from history, to illustrate.

During World War I, Sir Arthur Eddington was Secretary of the Royal Astronomical Society, which meant he was the first to receive a series of letters and papers from Willem de Sitter regarding Einstein’s theory of general relativity. […] He quickly became the chief supporter and expositor of relativity in Britain. […]

After the war, Eddington travelled to the island of Príncipe near Africa to watch the solar eclipse of 29 May 1919. During the eclipse, he took pictures of the stars in the region around the Sun. According to the theory of general relativity, stars with light rays that passed near the Sun would appear to have been slightly shifted because their light had been curved by its gravitational field. This effect is noticeable only during eclipses, since otherwise the Sun’s brightness obscures the affected stars. Eddington showed that Newtonian gravitation could be interpreted to predict half the shift predicted by Einstein.

Eddington’s observations published the next year confirmed Einstein’s theory, and were hailed at the time as a conclusive proof of general relativity over the Newtonian model. The news was reported in newspapers all over the world as a major story:

Overfitting- NYT 1919

Two competing theories, two perfectly adequate explanations for everyday phenomena, one test to differentiate models. Here we see a curiosity: scientists place a high value on new data. Gravitational lensing constituted powerful confirmation because, as far as the model-creators knew, it could have been the other way.

Overfitting- Einstein vs. Newton

A nice, clean narrative. But consider what happens next. General relativity has begun to show its age: it is chronically incompatible with quantum mechanics. Many successors to general relativity have been created; let us call them Quantum Gravity Theory A, B, and C. Frustratingly, no discrepancies have been found between these theory-universes and our observed-universe.

Overfitting- Ad-Hoc Theories

How are multiple correct options possible? From a computational perspective, the phenomenon of multiple correct answers can be modeled with Solomonoff Induction. A less precise, philosophical precursor of the same morale can be found in underdetermination of theory.

But which theory wins? General relativity won via a solar eclipse generating evidence for gravitational lensing. But gravitational lensing is now “old hat”; so long as all theories accommodate its existence, it no longer wields theory-discriminating power. And – given the translation proxy lag crisis outlined above – it may take some time before our generation acquires its test, its analogue of a solar eclipse.

Can anything be done in the meantime? Is science capable of discriminating between QGT/A, QGT/B, and QGT/C in the absence of a clean novel prediction? When we choose to invest more of our lives in one particular theory above the others, are we doomed to make this choice by mere dint of our aesthetic cognitive modules, by a sense of social belonging, by noise in our yedasentiential signals?

Overfitting: Failure Mode of Meat Science

Bayesian inference teaches us that confidence is usefully modeled as a probabilistic thing. But stochastisity is not equiprobability: discriminability is a virtue. If science cannot provide it, let us cast about for ways to reform science.

Let us begin our search by considering the rhetoric of explanation. What does it mean for a hypothesis to be criticized as ad-hoc?

> Scientists are often skeptical of theories that rely on frequent, unsupported adjustments to sustain them. This is because, if a theorist so chooses, there is no limit to the number of ad hoc hypotheses that they could add. Thus the theory becomes more and more complex, but is never falsified. This is often at a cost to the theory’s predictive power, however. Ad hoc hypotheses are often characteristic of pseudoscientific subjects.

Do you recall how we motivated data partitioning? Do you yet recognize the stench of overfitting?

In fact, in my view, current scientific practice is a bit too uncomfortable with falsification. Karl Popper lionized falsifiability, and the result of his movement has been an increase in the operationalization and measurement-affinity of the scientific grammar. But the weaving together scientific abstractions and the particle soup, the dawning taboo around Not Even Wrong, came with baggage.

Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive: it shows the theory, from which the implication is derived, to be false.

But compare this dictum with our result from machine learning, which suggests that perhaps small “falsifications” may be preferable to “getting everything right”:

Explainers who solely optimize against prediction error are in a state of sin.

In sum, we have reason to believe that overfitting is the pervasive illness of our meat science.

Takeaways

  • Science tracks truth not at the level of individual, but on a socio-historical scale. There is therefore room to move faster than science.
  • Science is accretive: the shoulders keep growing taller.
  • Due to constraints in measurement technology, data of a truly novel character has become increasingly difficult to acquire.
  • On meat science, as data transitions from “novel” to “known”, it loses its ability to wield theory-discriminating power.
  • Given these dual crises, and the philosophical commitments scientific communities have made against falsified hypotheses, science can be shown to suffer from overfitting.

Next time, we will explore applying the machine learning solution to overfitting – data partitioning – to meat science, and motivate the virtue of hiding data from ourselves. See you then!

 

Movement Forecast: Effective Availabilism

Table Of Contents

  • The Availability Cascade
  • Attentional Budget Ethics
  • Effective Availabilism
  • Why Quantification Matters
  • Cascade Reform Technologies
  • Takeaways

The Availability Cascade

The following questions pop up in my Facebook feed all the time.

Why is mental illness, addiction, and suicide only talked about when somebody famous succumbs to their demons?

Why do we only talk about gun control when there is a school shooting?

What is the shape of your answer? Mine begins with a hard look at the nature of attention.

Attention is a lens by which our selves perceive the world. The experience of attention is conscious. However, the control of attention – where it lands, how long it persists – is preconscious. People rarely think to themselves: “now seems an optimal time to think about gun control”. No, the topic of gun control simply appears.

When we pay attention to attention, its flaws become visible. Let me sketch two.

  1. The preconscious control of attention is famously vulnerable to a wide suite of dysrationalia. Like transposons parasitizing your DNA, beliefs parasitize your semantic memory by latching onto your preconscious attention-control software. This is why Evans-Pritchard was so astonished in his anthropological survey of Zande mysticism. This is why your typical cult follower is pathologically unable to pay attention to a certain set of considerations. The first flaw of the attentional lens is that it is a biasing attractor.
  2. Your unconscious mind is subject to the following computational principle: what you see is all there is. This brings us the availability heuristic, the cognitive shortcut your brain uses to travel from “this was brought to mind easily” to “this must be important”. The attentional lens is that the medium distorts its contents. This is nicely summed up in the proverb, “nothing in life is as important as you think it is, while you are thinking about it.” The second flaw of the attentional lens is that bound in a positive feedback loop to memory (“that which I can recall easily, must be important, leads me to discuss more, is something I recall even more easily”).

My treatment of this positive feedback loop was at the level of individual. But that same mechanism must also promote failures at the level of social network. The second flaw writ large – the rippling eddies of attentional currents (as captured by services like Google News) – are known as availability cascades. And thus we have provided a cognitive reason why our social atmosphere disproportionately discusses gun control when school shootings appear in the news.

In electrical engineering, positive feedback typically produces runaway effects: a circuit “hits the rails” (draws maximum current from its power source). What prevents human cognition from doing likewise, from becoming so fixated on one particular memory-attention loop that it cannot escape? Why don’t we spend our days and our nights dreaming of soft drinks, fast food, pharmaceuticals? I would appeal to human boredom as a natural barrier to such a runaway effect.

Attentional Budget Ethics

We have managed to rise above the minutia, and construct a model of political discourse. Turn now to ethics. How should attention be distributed? When is the right time to discuss gun control, to study health care reform, to get clear on border control priorities?

The response profile of such a question is too diverse to treat here, but I would venture most approaches share two principles of attentional budgets:

  1. The Systemic Failure Principle. If a system performance fails to meet some arbitrary criteria of success, that would be an argument for increasing its attentional budget. For example, perhaps the skyrocketing costs of health care would seem to call for more attention than other, relatively more healthy, sectors of public life.
  2. The Low Hanging Fruit Principle. If attention is likely to produce meaningful results, that would be an argument for increasing its attentional budget. For example, perhaps not much benefit would come from a national conversation about improving our cryptographic approaches to e-commerce.

Despite how shockingly agreeable these principles are, I have a feeling that different political parties may yet disagree. In a two party system, for example, you can imagine competing attentional budgets as follows:

Attentional Budgets

Interpret “attentional resources” in a straightforward (measurement-affine) way: let it represent the number of hours devoted to public discussion.

This model of attentional budgets requires a bit more TLC. Research-guiding questions might include:

  • How ought we model overlapping topics?
  • Should budget space be afforded for future topics, topics not yet conceived?
  • Could there be circumstances to justify zero attention allocation?
  • Is it advisable to leave “attentional budget creation” topics out of the budget?
  • How might this model be extended to accomodate time-dependent, diachronic budgeting?

Effective Availibilism

Let us now pull together a vision of how to transcend the attentional cascade.

In our present condition, even very intelligent commentators must resort to the following excuse of a thought: “I have a vague sense that our society is spending too much time on X. Perhaps we shouldn’t talk about it anymore”.

In our envisioned condition, our best political minds would be able to construct the following chain of reasoning: “This year, our society has spent three times more time discussing gun control than discussing energy independence. My attentional budget prescribes this ratio to be closer to 1:1. Let us think of ways to constrain these incessant gun-control availability cascades.”

In other words, I am prophesying the emergence of an effective availabilism movement, in ways analogous to effective altruism. Effective availabilist groups would, I presume, primarily draw from neuropolitical movements more generally.

Notice how effective availabilism relies on, and comes after, of publically-available psychometric data. And this is typical: normative movements often follow innovations in descriptive technology.

Why Quantification Matters

Policy discussions influence votes which affect lives. Despite the obvious need for constructive discourse, a frustrating amount of political exchanges are content-starved. I perceive two potential solutions for this failure of our democracy:

  1. Politics is a mind-killer. By dint of our evolutionary origins, our brains do not natively excel at political reasoning. Group boundaries matter more than analyses, arguments are soldiers. But these are epistemic failure modes. Policy debates should not appear one-sided. Movements to establish the cognitive redemption of politics are already underway. See, for example, Jonathon Haidt’s http://www.civilpolitics.org/ (“educating the public on evidence-based methods for improving inter-group civility”)
  2. Greasing policy discussions with data would facilitate progress. One of my favorite illustrations of this effect is biofeedback: if you give a human being a graphical representation of her pulse, the additional data augments the brains ability to reason – biofeedback patients are even able to catch their breath faster. In the same way, improving our data streams gives hope of transcending formerly-intractable social debates.

The effective availabilism movement could, in my view, accelerate this second pathway.

Cascade Reform Technologies

It seems clear that availability cascades are susceptible to abuse. Many advertisers and political campaigns don’t execute an aggregated optimization across our national attentional profile. Instead, they simply run a maximization algorithm on their topic of interest (“think about my opponent’s scandal!”).

With modern-day technology (polls, trending Twitter tags, motive abduction, self-monitoring), noticing attentional budget failures can be tricky. With the above technology in place, even subtle attentional budget failures will be easily detectable. We have increased our supply of failures, but how might effective availabilists increase demand (open vectors of reform towards availability cascade failure modes)?

The first, obvious, pathway is to use the same tool – attentional cascades – to counterbalance. If gun control is getting too much attention, effective availabilists will strive to push social media towards a discussion of e.g., campaign finance reform. They could, further, use psychometric data to evaluate whether they have overshot (SuperPACs are now too interesting), and to adjust as necessary.

Other pathways towards reform might be empirically-precise amplification of boredom circuits. Recruit the influential to promote the message that “this topic has been talked to death” could work; as could the targeted use of satire.

Takeaways

  • Pay more attention to the quiet whispers of your mind. “Haven’t I heard about this enough” represents an undiscovered political movement.
  • Social discourse is laced with the rippling tides of availability cascades, and are at present left to their mercy.
  • As hard psychometric data makes its way towards public accessibility, a market of normative attentional budgets will arise.
  • The business of pushing current attentional profiles towards normative budgets will become the impetus of effective availabilism movements.
  • A cottage industry of cognitive technologies to achieve these ends will thereafter crystallize and mature.

Attentional Budgets Usage (1)