A talk I gave this week, which localizes and extends the two-stream hypothesis.
Until next time.
A talk I gave this week, which localizes and extends the two-stream hypothesis.
Until next time.
Part Of: Sociality sequence
Content Summary: 1400 words, 14 min read
False Belief Blindness
Consider the Sally-Anne test:
A child is in a room, watching Sally and Anne who are also in the room. The room has two objects in it: a basket and a box.
Sally has a marble. Sally puts the marble in the basket and leaves the room. While she is gone, Anne moves the marble to the box.
Anne comes back, wanting to play with her marble. Where will she look for it?
To answer the question correctly, it is not enough to model Sally and Anne as having desires and beliefs (to deploy the intentional stance). The child must also be able to differentiate her own knowledge from the knowledge of others (the child is correct, but Sally is wrong). This is an instance of cognitive decoupling: building firewalls around the beliefs/desires of individual agents.
It turns out that 3yo children get the answer wrong, but 5yo children (except autistic ones) get it right. What happens at age four?
Why Belief Inference Is Blind …
At the end of the test, the child believes that the marble is in the box. Sally believes that the marble is in the basket. Their respective minds might look something like this:
In Awakening To A Social World, we learned that, at twelve months, children begin to think about other people as having beliefs and goals:
If the above picture is how your brain works, it would be puzzling to explain how a child would ever be tempted to conclude that Sally thinks the marble is in the box.
But this is not how your brain encodes second-order beliefs. Here’s what actually happens:

Crucially, relational second-order beliefs (“Sally thinks that”) point towards first-order beliefs (“Marble’s in basket”) which live in your world model. This mental library comes equipped with a librarian, who flags the incompatibility between “Marble’s in box” and “Marble’s in basket”, and removes the latter.
Architecturally, the reason why three year olds suffer from false belief blindness is that all beliefs funnel through one world model. There are simply no separate memory spaces to evaluate the world model of other people. In order to understand the beliefs of other people, they must be compatible with facts known by oneself.
… And Recognizing Falsehood Is No Cure
In Gullible By Default we discussed how negating beliefs is optional, effortful, and prone to failure. So you might think that the three year old child hasn’t yet developed the ability to negate Sally’s belief. But you’d be wrong. if you look carefully at the mechanics of negation, you will realize that negation cannot help.
Negating claims is implemented by adding an “It is false that” tag in front of the belief in question. We can negate Sally’s belief in two different ways:

Both negations fail. It is simply not true that Sally recognizes the ball isn’t in the basket. Likewise, we cannot say that Sally does not think that the ball is in the basket. Even exotic combinations (e.g., double negatives) are of no use.
But shouldn’t recognizing falsehood in one’s self enable us to recognize it in others?
No. It may help to notice that lie detection immediately corrects false beliefs (by introducing an “it is false than” mental prefix). This repair is essential to protect the mental ecosystem from contamination. Put simply, negation evolved as a protection against deception; it is simply not equipped to recognize honest mistakes.
The Birth Of Creativity
Quite independently of the evolution of lie detectors, the hominid line has also acquired a different kind of ability: the ability for pretense. Did you play the “floor is lava” game as a child? What is going on in the mind of a child when they pretend that couches, etc are the only refuge from a sea of molten rock?
You might be tempted to say that there exists a “floor is lava” belief in your mental library during such games. Or that the falsehood-detector is exploring the possibility of appended “It is false that…” to the traditional belief that “the floor is carpet”. But something more sophisticated is going on. As Leslie puts it:
If I jump up suddenly because I mistakenly think I see a spider on the table, I act as if a spider were there. But I certainly do not pretend a spider is there.
Instead of replacing belief, the child is alternating between competing beliefs. More specifically, the child is building a tiny little scaffold, which hovers over their actual belief that “the floor is carpet” and simulates a world in which that belief is replaced. We may call this counterfactual simulation.
To operate effectively, your counterfactual simulation must:
Why has a counterfactual simulator evolved in the hominid line?
Pretense originates in play, but is far more significant than that. With the ability to simulate different worlds, our minds are able to “try on” new beliefs as if they were hats. If we locate a belief that explains the world better than our world model, we upgrade our world model. Counterfactuals allow our prediction machines to upgrade themselves. They are the algorithmic bedrock of creativity.
How Counterfactuals Restored Our Sight …
A timeline of mind-relevant developmental milestones:
Despite the large time gap between pretense and the blindness reversal, I claim that pretense is the cure. What gives?
The answer lies in the fallback mechanism for representing false beliefs. Recall that the 13 month old’s mental librarian rejects “Marble’s In Basket” as incompatible, and as a fallback, re-routes “Sally thinks that” towards the true belief “Marble’s In Box”. When the counterfactual simulator comes online at twenty months, it isn’t yet involved in this failure mode.
It is only slowly that the child’s brain discovers that these two technologies can be productively combined. Passing the Sally-Anne test requires a novel modification to the error processing algorithm:
Not only are false beliefs encoded counterfactually, but the novel data are stored in the relationship model for later reuse. This is how we become aware of the fallibility of our peers.
… At The Cost Of Self-Anchoring
In Epistemic Topography, I said this:
The curse of knowledge expects short inferential distances. Why does this bias (not another) live in our brains?
As we have seen, estimating [epistemic] location is expensive. So the brain takes a shortcut: it uses a location it already knows about (its own) and employs differences between the Self and the Other to estimate distance. Call this self-anchoring. But the brain isn’t aware of all differences, only those it observes. Hence the process of “pushing out” one’s estimation of Other Locations typically doesn’t go far enough… the birthplace of the curse.
We now have a mechanical explanation for this mental shortcut. The more differences between our beliefs and another person, the more data we must encoded counterfactually. But counterfactuals are not prediction machines in their own right; they only facilitate tinkering with our own machinery. Other people are thus presumed similar until proven different.
Takeaways
Executive Summary:
References
Part Of: Sociality sequence
Followup To: Agent Detection: Life Recognizing Itself
Content Summary: 1500 words, 15 min read
David Hume once observed:
There is an universal tendency among mankind to conceive all beings like themselves, and to transfer to every object those qualities of which they are intimately conscious. We find faces in the moon, armies in the clouds; and, by a natural propensity, if not corrected by experience and reflection, ascribe malice or goodwill to every thing that hurts or pleases us … trees, mountains and streams are personified, and the inanimate parts of nature acquire sentiment and passion.
How did we acquire the ability to discover minds in the world? Let’s find out.
Equifinality: Awakening To Goals
Obviously enough, living things obey the laws of physics. But living things (which we called agents) also have properties unique to them: characteristic appearances (e.g., faces) and behaviors (e.g., self-propelled movement). In Life Recognizing Itself we saw how this allows the brain to build better prediction machines around these agents. We saw that differentiating agents does not require representing other minds. This fact allows us to appreciate the gradual maturation of the Agency Detection system as we travel across the Tree of Life.
Once an organism is detected, the Agent Classifier strains additional information out of its perceptual signature. But there are many other improvements we can make to our prediction machines. Consider, for example, the perspective of an infant who every night is picked up and placed in her crib.
On some nights she may be crawling around the kitchen, in others the family room, in others sitting in her parent’s lap on the couch. Despite these very diverse beginnings, the outcome of the putting-to-bed process is always the same. From the perspective of the neurons in her eyes, very different beginning states always result in the same end state. This property is known as equifinality, and it is not unique to nursery rooms: it is ubiquitous among living things (another example: the behavior of water buffalo around watering holes).
How might prediction machines anticipate equifinality? An efficient explanation of equifinality comes from ascribing goals to an agent.
Dominance Hierarchies: Awakening To Belief
Minds capable of computing situation-agnostic equifinality already have an advantage: perhaps the lion just ate and is not hungry enough to give chase, but the gazelle is just as well modeling the lion as universally hungry. But there is room to inject a little contingency. Return back to our nursery example: the infant knows that putting-to-bed desire is only activated at night. How best to anticipate contingent desire? An efficient explanation for equifinality contingency is ascribing beliefs to an agent.
In fact, there are myriad benefits towards possessing belief inference systems besides equifinal contingency. Another important reward gradient stems from most important social structure of the animal kingdom: the dominance hierarchy. Wolf packs, for example, feature an alpha male, who is given special feeding & reproductive privileges. In species like the baboon, this hierarchy becomes much more detailed: every individual male knows their who is above & who is below their place.
Dominance hierarchies can arguably exist without its members having theories of mind. But considering that agility in climbing the hierarchy is under strong selective pressure, and that effective navigation requires tremendous psychological prowess. For example, in his book A Primate’s Memoir, Robert Sapolsky recounts tales of his baboons forming alliances in their attempts to dethrone the sitting alpha. The ability to model the beliefs and desires of one’s alliance partner is surely a boon; hence discovering mind can be viewed as a selective consequence of the dominance hierarchy.
Intentional Stance
This tendency to ascribe beliefs and desires onto other agents together known as the intentional stance. Let us name the module responsible for this ability the Agent Mentalizer. The intentional stance appears in children between 6 and 12 months (Gergely et al, 1994).
As we might expect from any algorithm in the Agency Detection system, we might expect the intentional stance to be quite vulnerable to false positives. And that is exactly what we find. For an extreme demonstration of this, consider the following video (taken from Heider & Simmel. (1944)):
While the Agent Mentalizer was designed to understand the minds of other animals, it had no trouble ascribing beliefs and goals to two dimensional shapes. This is roughly analogous to your email provider accepting a tennis ball as a login password.
Impression Management Via Secondary Models
In your brain, you have a set of beliefs about the world. These beliefs may be stored in many different memory systems, but all of them together improve the power of prediction that you wield over your environment. Your knowledge, your web of belief, I call a World Model. For every significant individual in your life, you have a collection of beliefs about them, provided by your relationship modeling system. Some of these beliefs are non-mental, obtained from systems like the Agent Classifier. Call these beliefs collectively your primary model. For every significant individual in your life, you have a primary model for them; your World Model contains many primary models.
However, the Agent Mentalizer evolved to infer mental states (beliefs, goals) of other individuals as well. But simulating mental states is not like simulating objects – mental states are about objects. That is, we require a new type of model – a secondary model – to simulate the primary model of another person.
Confused? An example should help.
Imagine a three year old child, Bonnie, and her best friend Clyde. Since she was very young, Bonnie has been accumulating shared experiences with Clyde, and is now able to recognize him by appearance. These memories and knowledge of Clyde are stored in a primary model.
At the twelve month mark, Bonnie acquired the ability to simulate beliefs/goals of other people. This awakening has greatly improved her understanding of Clyde. She began noticing when Clyde was not in the mood to play, and his opinions of various toys. She even became able to crudely simulate Clyde’s response to situation X, even if Clyde hadn’t encountered X in real life.
During this time, of course, Clyde went through the very same maturation process. We can model their co-understanding as follows:
The above graphic underscores the “egotistical” subset of secondary models: simulation of the impression they are making on one another. The intentional stance is the birthplace of impression management.
Against Ternary Models
If secondary models simulate primary models, can ternary models simulate secondary models? Well, let’s not close ourselves to the possibility. Here’s what such a thing looks like:
So… ternary models are hard to understand! Let’s simplify:
Make sure the construction logic of the above image is clear: you should have no trouble constructing a quaternary graphic, quinary graphic, etc. Doesn’t the raw ability to reason about such higher-order logics mean that your brain is capable of infinitely-nested meta-models?
A useful analogy here is the natural numbers (0, 1, 2, …). How can your brain hope to count arbitrarily large numbers of things? After all, folkmathematics is fueled with biochemistry; the number of representations it can store is finite. So… how many numbers can it count?
The correct answer is four. That is, your subitizing module renders your ability to count up to four is almost instantaneous; for larger numbers, response times rise dramatically, with an extra 250-350 ms added for each additional item beyond about four items. We see a similar time difference in recursive reasoning: most people are easily able to simulate of other people (primary modeling) and conduct impression management (secondary modeling). But imagining the impression-management of other people takes conscious effort; and thus cannot be part of your default Theory Of Mind machinery.
With these insights in mind, we see that ternary models out not be admitted into our mental architecture simply because they are conceivable. Such a solution would not be parsimonous since these complex behaviors can be built from these atomic components. In fact, in the simplification above, you can even see hints of your two-level brain “translating” ternary concepts into a more direct language that it better understands.
Takeaways
Here are the ideas I want you to walk away from this post:
Relevant Resources
The Nature Of Metaphor
Just for fun, let me open today’s discussion with a few aphorisms:
Primary Sensorimotor Metaphor
Okay, time to delve a little deeper. Consider the following metaphors.
What similarities between these metaphors do you see? [Footnote 1] Well, these questions are all unidirectional, and explain abstract concepts by appealing to more down-to-earth domains. What do I mean by down-to-earth? Well, all of the above examples appeal to perceptual or motor phenomena!
In terms of the human brain, perceptual and motor (“sensorimotor”) systems tend to reside in the cortical homunculus. In terms of the human memory hierarchy, these types of concepts tend to arise in procedural memory.
Primary Metaphor In Other Memory Systems
Now, human memory contains more than just procedural memory. We can use our understanding of other memory systems to predict other kinds of primary metaphor.
Metaphor Composition Is Narrative Paint
Human beings conceptualize abstract objects by bringing many primary metaphors into a complex whole. Let me pull an example from Lakoff & Johnson: the concept of Time [Footnote 2].
The Time Orientation Metaphor looks like this:
Examples: That’s all behind us now. We’re looking forward to your presentation. He has a great future in front of him.
The Moving Time Metaphor interprets times to be objects and the passage of time to be the motion of objects past the observer. This metaphor really finds its legs when composed with the Time Orientation metaphor. The Time Orientation + Moving Time complex metaphor, then, looks like this:
Examples: The time will come when there are no more typewriters. The time has long since gone when you could mail a letter for three cents. The time for action has arrived. Thanksgiving is coming up on us. Time is flying by. Let’s meet the future head-on.
But abstractions like time are typically underwritten by more than one can of narrative paint. In this case, the Moving Observer Metaphor alternatively imagines location on the observer’s path as times, and the motion of the observer as the passage of time. Here is the Time Orientation + Moving Observer complex metaphor, in full detail:
Examples: There’s going to be trouble down the road. What will be the length of his visit? Let’s spread the conference over two weeks. We passed the deadline. We’re halfway through September. His visit to Russia extended over many years.
Takeaways
Today, I gave you examples of “primary” metaphor, which in this case were grounded in the human perceptual/motor systems. Abstract concepts are made by gluing primary metaphors together like Legos. I also left you with several aphorisms, including:
Footnotes
Followup To: An Introduction To Prisoner’s Dilemma
Part Of: Algorithmic Game Theory sequence
Content Summary: 500 words, 5 min read
Consider the following game:
Both spouses prefer the company of one another. In fact, they spent the evening together, they incur zero cost. However, if faced with a choice of waiting in an empty house vs. earning some overtime, they would prefer the latter twice as much.
We encode this game’s strategy-space as follows:
Recall our previous discussion of Pareto optimality and strategic dominance. There are myriad ways to think about games, why isolate those two properties, in particular? One reason to invent names is to construct a universal toolkit: non-trivial properties that exist in all games, and amenable to our analyses.
Pareto optimality makes an appearance in the above game (H, H). But strategic dominance does not. Take a moment to convince yourself this is true.
Since strategic dominance is too strong to be a universal property, we might relax it. What happens when we encode regret? That is, what does the Spousal Game look like after we consider cases when a player wishes she had made a different choice?
Arrows represent regret.
Let us view Prisoner’s Dilemma from the same lens.
These arrows have a peculiar pattern. Why? Because of strategic dominance!
We can explain strategic dominance in terms of regret. That is, if a player’s regrets are all in the same direction, then that player is subject to strategic dominance.
Does regret belong in our universal toolbox? No. Regret by itself is rather uninteresting: every game with non-trivial utilities comprises myriad regrets.
We need a stronger property. How about games containing outcomes with no regret? Call such outcomes Nash equilibrium.
Does every game have Nash equilibria?
Perhaps no game fail to have this peculiar creature..
Conjecture 1: Every game with a finite number of players and strategies has at least one Nash equilibrium.
Time to search for counter-examples! Consider the game Rock-Paper-Scissors. Here’s how it looks to a game theorist:
Crucially, every node has at least one arrow leaving it. Rock/Paper/Scissors has no equilibrium! We have thus disproven Conjecture 1. What now?
In addition to the absence of Nash equilibria, this game is interesting in another sense. It turns out that having a deterministic strategy in Rock/Paper/Scissors is a bad idea. In fact, machines can reliable beat people at Rock/Paper/Scissors by exploiting patterns in human gameplay.
What happens when we expand our notion of game to incorporate non-deterministic strategies. The best mixed strategy a player can adopt is [⅓ ⅓ ⅓]; that is where each choice is randomly selected with probability 0.33.
While hard to visualize, it is easy to intuitively grasp the existence of a new equilibrium. If each player adopts [⅓ ⅓ ⅓], neither will experience regret! Thus we can safely repair our Conjecture:
Conjecture 2. Every game with a finite number of players and strategies has at least one Nash equilibrium if mixed strategies are allowed.
It turns out that Conjecture 2 is entirely correct. In fact, its correctness is one of the most significant results in all of game theory.
Followup To: [Data Partitioning: How To Repair Explanation]
If I have seen a little further, it is by standing on the shoulders of giants
– Isaac Newton
Table Of Contents
Context
Last time, we learned two things:
Today, we will apply these results to human scientific processes, or to what I will affectionately call meat science.
The Optimization Level Of Science
Nietzsche captures the reputation of science well:
Science is flourishing today and her good conscience is written all over her face, while the level to which all modern philosophy has gradually sunk… philosophy today, invites mistrust and displeasure, if not mockery and pity. It is reduced to “theory of knowledge”… how could such a philosophy dominate? … The scope and the tower-building of the sciences has grown to be enormous, and with this also the probability that the philosopher grows weary while still learning….
Attempts to ground such sentiments in something rigorous exceeds the scope of this post. Today, we simply accept that science is particularly epistemically productive. But let us move beyond the cheerleading, and ask ourselves why this is so.
If I were to hand over a map of our specie’s cognitive architecture to an alien species, I would expect them to predict demagogues much more easily than the discovery of the Higg’s Boson. The simple truth is that our minds are flawed: we are born with clumsy inference machinery. How then is epistemic productivity possible?
The success of science has been said to derive from the scientific method:
But does the scientific method lend itself to the debiasing of the human animal? I argue it does not. Tribalism in scientific communities, for example, doesn’t seem particularly muted compared to other realms of human experience. Further, in his classic text The Structure of Scientific Revolutions, Thomas Kuhn showed that scientific revolutions emerge from strong, extra-rational motives. In his view, the nature of paradigm shifts is a bit like mystical religious experience: deeply personal, and difficult to verbalize.
I like to imagine science as a socio-historical process. Individuals and even sub-communities within its disciplines may fail to track what is Really There, but communities on the whole tend to move towards this direction. As Max Planck once observed:
A scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it.
The question of how the scientific method facilitates socio-historical truth-tracking is, I believe, unsolved. But all I must do today, is flag the optimization level of science. If science organically debiases our species at the socio-historical level, there is much room for improvement. If cognitive science can forge new debiasing weapons, we will become increasingly able to transcend ourselves, able to move faster than science.
The Fruits Of Science
Consider again the following:
This meme is sticky, and signals humility… but it also frames a crisis.
The shoulders keep growing taller.
Because of the accretive nature of science, our collective knowledge far outpaces our cognitive abilities. Even if the controversial Flynn effect is true and our collective IQ really is improving over time, the size of our databases would still outstrip the reach of our cognitive cone.
Various technologies have been invented that assuage this crisis. Curriculum compression is one of the older ones. That which was originally available as an anthology of journal articles is compressed into a review article, review articles compressed to graduate courses, graduate courses polished into undergraduate presentations. Consider how Newton would marvel at a present-day course in differential calculus: 300 years of mathematical research, successfully installed into an undergraduate’s semantic memory in a matter of weeks.
Curriculum compression is but one answer to our exploding knowledge base. Other implicit reactions include:
In the years ahead, society seems geared to add two more solutions to our “bag of tricks”:
And yet, the shoulders keep growing taller…
Translation Proxy Lag
In Knowledge, an Empirical Sketch, I introduced measurement under the title of translation proxies. Why the strange name? To underscore the nature of measurement: forcibly relocating sensory phenomena from its “natural habitat” towards a signature that our bodies are equipped to sense. In this light, a translation proxy can be viewed as a kind of faucet, bringing novel forms of physical reality into the human umwelt.
But consider physics, where the “low-hanging fruit” translation proxies have already been built. You don’t see many physicists clamoring for new hand-held telescopes. Instead, you see them excited over ultra-high precision telescope mirrors, over particle colliders producing energies in the trillions of electronvolts. Such elaborate proxy technologies simply do not flow as quickly from the furnace of invention. Call this phenomenon data intake stagnation.
Not only are our proxy innovations becoming more infrequent, but they are also severely outpaced by our theoreticians. In physics, M theory (the generalization of string theory) posits entities at the 10^-34 m scale, but our current microscopes can only interrogate resolutions around the 10^-15 m scale. In neuroscience, connectome research programs seek to graph the nervous system at the neuronal level (10^-6 m), but most imaging technologies only support the millimeter (10^-3) range. Call this phenomenon translation proxy lag.
It will be decades, perhaps centuries, before our measurement technologies catch up.
Cheap Explanations
Let us bookmark translation proxy lag, and consider a different sort of problem.
Giant shoulders are not merely growing taller. They also render impotent evidence once vital to theoreticians. Let me appeal to a much-cited page from history, to illustrate.
During World War I, Sir Arthur Eddington was Secretary of the Royal Astronomical Society, which meant he was the first to receive a series of letters and papers from Willem de Sitter regarding Einstein’s theory of general relativity. […] He quickly became the chief supporter and expositor of relativity in Britain. […]
After the war, Eddington travelled to the island of Príncipe near Africa to watch the solar eclipse of 29 May 1919. During the eclipse, he took pictures of the stars in the region around the Sun. According to the theory of general relativity, stars with light rays that passed near the Sun would appear to have been slightly shifted because their light had been curved by its gravitational field. This effect is noticeable only during eclipses, since otherwise the Sun’s brightness obscures the affected stars. Eddington showed that Newtonian gravitation could be interpreted to predict half the shift predicted by Einstein.
Eddington’s observations published the next year confirmed Einstein’s theory, and were hailed at the time as a conclusive proof of general relativity over the Newtonian model. The news was reported in newspapers all over the world as a major story:
Two competing theories, two perfectly adequate explanations for everyday phenomena, one test to differentiate models. Here we see a curiosity: scientists place a high value on new data. Gravitational lensing constituted powerful confirmation because, as far as the model-creators knew, it could have been the other way.
A nice, clean narrative. But consider what happens next. General relativity has begun to show its age: it is chronically incompatible with quantum mechanics. Many successors to general relativity have been created; let us call them Quantum Gravity Theory A, B, and C. Frustratingly, no discrepancies have been found between these theory-universes and our observed-universe.
How are multiple correct options possible? From a computational perspective, the phenomenon of multiple correct answers can be modeled with Solomonoff Induction. A less precise, philosophical precursor of the same morale can be found in underdetermination of theory.
But which theory wins? General relativity won via a solar eclipse generating evidence for gravitational lensing. But gravitational lensing is now “old hat”; so long as all theories accommodate its existence, it no longer wields theory-discriminating power. And – given the translation proxy lag crisis outlined above – it may take some time before our generation acquires its test, its analogue of a solar eclipse.
Can anything be done in the meantime? Is science capable of discriminating between QGT/A, QGT/B, and QGT/C in the absence of a clean novel prediction? When we choose to invest more of our lives in one particular theory above the others, are we doomed to make this choice by mere dint of our aesthetic cognitive modules, by a sense of social belonging, by noise in our yedasentiential signals?
Overfitting: Failure Mode of Meat Science
Bayesian inference teaches us that confidence is usefully modeled as a probabilistic thing. But stochastisity is not equiprobability: discriminability is a virtue. If science cannot provide it, let us cast about for ways to reform science.
Let us begin our search by considering the rhetoric of explanation. What does it mean for a hypothesis to be criticized as ad-hoc?
> Scientists are often skeptical of theories that rely on frequent, unsupported adjustments to sustain them. This is because, if a theorist so chooses, there is no limit to the number of ad hoc hypotheses that they could add. Thus the theory becomes more and more complex, but is never falsified. This is often at a cost to the theory’s predictive power, however. Ad hoc hypotheses are often characteristic of pseudoscientific subjects.
Do you recall how we motivated data partitioning? Do you yet recognize the stench of overfitting?
In fact, in my view, current scientific practice is a bit too uncomfortable with falsification. Karl Popper lionized falsifiability, and the result of his movement has been an increase in the operationalization and measurement-affinity of the scientific grammar. But the weaving together scientific abstractions and the particle soup, the dawning taboo around Not Even Wrong, came with baggage.
Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive: it shows the theory, from which the implication is derived, to be false.
But compare this dictum with our result from machine learning, which suggests that perhaps small “falsifications” may be preferable to “getting everything right”:
Explainers who solely optimize against prediction error are in a state of sin.
In sum, we have reason to believe that overfitting is the pervasive illness of our meat science.
Takeaways
Next time, we will explore applying the machine learning solution to overfitting – data partitioning – to meat science, and motivate the virtue of hiding data from ourselves. See you then!
Table Of Contents
The Availability Cascade
The following questions pop up in my Facebook feed all the time.
Why is mental illness, addiction, and suicide only talked about when somebody famous succumbs to their demons?
Why do we only talk about gun control when there is a school shooting?
What is the shape of your answer? Mine begins with a hard look at the nature of attention.
Attention is a lens by which our selves perceive the world. The experience of attention is conscious. However, the control of attention – where it lands, how long it persists – is preconscious. People rarely think to themselves: “now seems an optimal time to think about gun control”. No, the topic of gun control simply appears.
When we pay attention to attention, its flaws become visible. Let me sketch two.
My treatment of this positive feedback loop was at the level of individual. But that same mechanism must also promote failures at the level of social network. The second flaw writ large – the rippling eddies of attentional currents (as captured by services like Google News) – are known as availability cascades. And thus we have provided a cognitive reason why our social atmosphere disproportionately discusses gun control when school shootings appear in the news.
In electrical engineering, positive feedback typically produces runaway effects: a circuit “hits the rails” (draws maximum current from its power source). What prevents human cognition from doing likewise, from becoming so fixated on one particular memory-attention loop that it cannot escape? Why don’t we spend our days and our nights dreaming of soft drinks, fast food, pharmaceuticals? I would appeal to human boredom as a natural barrier to such a runaway effect.
Attentional Budget Ethics
We have managed to rise above the minutia, and construct a model of political discourse. Turn now to ethics. How should attention be distributed? When is the right time to discuss gun control, to study health care reform, to get clear on border control priorities?
The response profile of such a question is too diverse to treat here, but I would venture most approaches share two principles of attentional budgets:
Despite how shockingly agreeable these principles are, I have a feeling that different political parties may yet disagree. In a two party system, for example, you can imagine competing attentional budgets as follows:
Interpret “attentional resources” in a straightforward (measurement-affine) way: let it represent the number of hours devoted to public discussion.
This model of attentional budgets requires a bit more TLC. Research-guiding questions might include:
Effective Availibilism
Let us now pull together a vision of how to transcend the attentional cascade.
In our present condition, even very intelligent commentators must resort to the following excuse of a thought: “I have a vague sense that our society is spending too much time on X. Perhaps we shouldn’t talk about it anymore”.
In our envisioned condition, our best political minds would be able to construct the following chain of reasoning: “This year, our society has spent three times more time discussing gun control than discussing energy independence. My attentional budget prescribes this ratio to be closer to 1:1. Let us think of ways to constrain these incessant gun-control availability cascades.”
In other words, I am prophesying the emergence of an effective availabilism movement, in ways analogous to effective altruism. Effective availabilist groups would, I presume, primarily draw from neuropolitical movements more generally.
Notice how effective availabilism relies on, and comes after, of publically-available psychometric data. And this is typical: normative movements often follow innovations in descriptive technology.
Why Quantification Matters
Policy discussions influence votes which affect lives. Despite the obvious need for constructive discourse, a frustrating amount of political exchanges are content-starved. I perceive two potential solutions for this failure of our democracy:
The effective availabilism movement could, in my view, accelerate this second pathway.
Cascade Reform Technologies
It seems clear that availability cascades are susceptible to abuse. Many advertisers and political campaigns don’t execute an aggregated optimization across our national attentional profile. Instead, they simply run a maximization algorithm on their topic of interest (“think about my opponent’s scandal!”).
With modern-day technology (polls, trending Twitter tags, motive abduction, self-monitoring), noticing attentional budget failures can be tricky. With the above technology in place, even subtle attentional budget failures will be easily detectable. We have increased our supply of failures, but how might effective availabilists increase demand (open vectors of reform towards availability cascade failure modes)?
The first, obvious, pathway is to use the same tool – attentional cascades – to counterbalance. If gun control is getting too much attention, effective availabilists will strive to push social media towards a discussion of e.g., campaign finance reform. They could, further, use psychometric data to evaluate whether they have overshot (SuperPACs are now too interesting), and to adjust as necessary.
Other pathways towards reform might be empirically-precise amplification of boredom circuits. Recruit the influential to promote the message that “this topic has been talked to death” could work; as could the targeted use of satire.
Takeaways