Rule Feedback Loops

Part Of: Breakdown of Will sequence
Followup To: Willpower As Preference Bundling
Content Summary: 900 words, 9 min reading time

Context

When “in the moment”, humans are susceptible to bad choices. Last time, we introduced willpower as a powerful solution to such akrasia. More specifically:

  • Willpower is nothing more, and nothing less, than preference bundling.
  • Inasmuch as your brain can sustain preference bundling, it has the potential to redeem its fits of akrasia.

But this only explained how preference bundling works at the level of utility curves. Today, we will learn how preference bundling is mentally implemented, and this mental model will in turn provide us with predictive power.

Building Mental Models

Time to construct a model! 🙂 You ready?!

In our last post, we discussed three distinct phases that occur during preference bundling. We can then imagine three separate modules (think: software programs) that implement these phases.

Personal Rule- Crude Decision Process (2)

This diagram provides a high-level, functional account of how our minds make decisions. The three modules can be summarized as follows:

  • The Utility Transducer module is responsible for identifying affordances within sensory phenomena, and compressing a multi-dimensional description into a one-dimensional value.
  • The Preference Bundler module can aggregate utility representations that are sufficiently similar. Such a technique is useful for combating akrasia.
  • The Choice Implementer module selects Choice1 if Preference1 > Preference2. It is also responsible for computing when and how to execute a preference-selection.

The above diagram is, of course, merely a germinating seed of a more precise mental architecture (it turns out that mind-space is rather complex 🙂 ). Let us now refine our account of the Preference Bundler.

Personal Rules

Consider what it means for a brain to implement preference bundling. Your brain must receive utility-anticipated information from an arbitrary number of choice valuations, and aggregate similar decisions into a single measure.

Obviously, the mathematics of such a computation lies underneath your awareness (your superpower is math). However, does the process entirely fail to register in the small room of consciousness?

This seems unlikely, given the common phenomenal experience of personal rules. Is it not likely that the conscious experience of “I will never stay up past midnight on a weeknight” does not in some way correlate with the actions of the Preference Bundler?

Let’s generalize this question a bit. In the context of personal rules, we are inquiring about the meaning of quale-module links. This type of question is relevant in many other contexts as well. It seems to me that such links can be roughly modeled in the vocabulary of dual-process theory, where System 1 (parallel modules) data bubbles up into System 2 (sequential introspection) experience.

Let us now assume that the quale of personal rules correlates to some variety of mental substance. What would that substance have to include?

In terms of complexity analysis, it seems to me that a Preference Bundler need not generate relevant rules on the fly. Instead, it could more efficiently rely on a form of rule database, which tracks a set of rules proven useful in the past. Our mental architecture, then, looks something like this (quales are in pink):

Personal Rule- Rules Subserving Bundling

In his book, Ainslee presents intruging connections between this idea of a rule database with similar notions in the history of ideas:

The bundling phenomenon implies that you will serve your long-range interest if you obey a personal rule to behave alike towards all members of a category. This is the equivalent of Kant’s categorical imperative, and echoes the psychologist Lawrence Kohlberg’s sixth and highest principle of moral reasoning, deciding according to principle. It also explained how people with fundamentally hyperbolic discount curves may sometimes learn to choose as if their curves were exponential.

Recursive Feedback Loops

Personal rules, of course, are not spontaneously appear within your mind. They are constructed by cognitive processes. Let us again expand our model to capture this nuance:

Personal Rule- Preference Regulation

Describing our new components:

  • The Rule Controller module is responsible both for generating new rules (e.g., “I will not stay up past midnight on a weeknight”), and re-factoring existing ones.
  • The “Honored?” checkpoint conveys information on how well a given personal rule was followed. The Rule Controller module may use this information to update the rule database.

A feedback loop exists in our mental model. Observe:

Personal Rule- Feedback

Feedback loops can explain a host of strange behavior. Ainslie describes the torment of a dieter:

Even if [a food-conscious person] figures, from the perspective of distance, that dieting is better, her long-range perspective will be useless to her unless she can avoid making too many rationalizations. Her diet will succeed only insofar as she thinks that each act of compliance will be both necessary and effective – that is, that she can’t get away with cheating, and that her current compliance will give her enough reason not to cheat subsequently. The more she is doubtful of success, the more likely it will be that a single violation will make her lose this expectation and wreck her diet. Personal rules are a recursive mechanism; they continually take their own pulse, and if they feel it falter, that very fact will cause further faltering.

Takeaways

And that’s a wrap! 🙂 I am hoping to walk away from this article with two concepts firmly installed:

  • Preference bundling is mentally implemented via a database of personal rules (“I will do X in situations that involve Y).
  • Personal rules constitute a feedback loop, whereby rule-compliance strengthen (and rule-circumvention weakens) the circuit.

Next Up: [Iterated Schizophrenic’s Dilemma]

Willpower As Preference Bundling

Part Of: [Breakdown of Will] sequence
Followup To: [An Introduction To Hyperbolic Discounting]

Table Of Content

  • Motivations
  • Choice Is Not A Snapshot
  • Utility Anticipation
  • Preference Bundling
  • Takeaways

Motivations

Consider again the takeaways from our preceding post:

  • Behavior contradicting your desires (akrasia) can be explained by appealing to the rate at which preferences diminish over time (utility discount curve).
  • A useful way of reasoning about hyperbolic discount curves is warfare between successive “yous”.

With this model of akrasia in hand, we will proceed to explore: akrasia therapies, tactical countermeasures we employ to combat your “sin nature”.

How do people verbalize their struggle to preserve “perspective from a distance”? Well, excepting cases where their past selves deprive their future selves of freedom (think of Odysseus chaining himself to the ship), more often than not people gesture towards the concept of willpower.

Today we will build an account of what this fuzzy word actually means.

Choice Is Not A Snapshot

Do you remember our cartoon of a hyperbolic discount curve?  Here it is again:

Willpower- Choice Hyperbolic Example

Recall that the orange curge is LL (the larger-later reward), and the green in SS (the smaller-sooner reward). Akrasia occurs when some temptation transiently overpowers the choice we would otherwise select (when SS > LL).

Now this picture is, of course, extremely narrow.  We haven’t discussed:

  1. How this picture may be extended to greater than two choices
  2. How choices are differentiated at all (what prevents utility curves from being decomposed to smaller and smaller “choice units”).
  3. How a utility curve might be realized in the human brain at all.

I hope to get to these questions eventually. Today, I’d like to introduce just one “complication” into our cartoon: in real life, an organism must confront many choices across its lifespan.

Willpower- Choice Multiple

Utility Anticipation

Imagine your author waging an war-with-himself re: whether to stay up late researching, or get a full night’s sleep. Does the above graph comfortably fit in this category of recurrent choice?

One could argue that it fails: there is nothing to stop me from weighing the choices of my one-week-from-now self. To do this, I must represent their utilities in the present. Here’s how we might support such utility anticipation (only three examples shown, to constrain complexity):

Willpower- Choice Similarities

Does this anticipation counteract akrasia, instances where LL (larger-later reward) loses to SS (small-sooner reward)? To answer this, simply compare the utility curves:

Willpower- Foreknown Choice Selves

Akratic lapses persist. But if utility anticipation is not the root of willpower, what is?

Preference Bundling

Our innovation has its roots in philosophy.

Writers since antiquity have recommended that impulses could be controlled by deciding according to principle, that is, deciding in categories containing a number of choices rather than just the choice at hand. Aristtotle said that akrasia is the result of choosing according to “particulars” instead of “universals”. Kant said that the highest kind of decision-making involves making all choices as if they defined universal rules (the “categorical imperative”)… this also echoes the psychologist Lawrence Kohlberg’s sixth and highest principle of moral reasoning, deciding according to principle.

Imagine, for a moment, if we were to sum these utility curves together. Three time-slices should illustrate how this summing works:

Willpower- Foreknown Choice Addition (1)

Consider the middle time slice (middle rectangle). If you look closely, you’ll see it intersect three orange LL functions and the three green SS functions. If we sum the three LL values together, we arrive at a single (larger) LL value. Same story goes for the three green SS lines. The two circles directly over the rectangle represent these two sums.

In the above illustration, we bundled preferences at three particular times. But what is to stop us from doing the same computation at all times? Nothing, and after we do this, we obtain new utility lines. These lines represent the bundled utility curves of all three choices.

Willpower- Foreknown Choice Bundling (1)

(Why don’t we continue this addition for the entire displayed x-axis? Well, to preserve the effect, after Choice1 expires, you have to take into account Choice4, etc etc…)

Casting these bundled utility curves back to the language of successive selves:

Willpower- Bundled Choice Selves

In this situation, then, we see the defeat of akrasia – the victory of willpower!

Takeaways

  • Willpower is nothing more, and nothing less, than preference bundling.
  • Inasmuch as your brain can sustain preference bundling, it has the potential to redeem its fits of akrasia.

Next Up: [Personal Rule Feedback Loops]

[Sequence] Breakdown Of Will

Willpower- Foreknown Choice Selves

In this sequence, we will be exploring this précis of this book. Specifically, we will be exploring the implications of akrasia (the act of behaving against one’s own desires).

Preliminary Posts

Content Summary

  1. An Introduction To Hyperbolic Discounting. Based on Chapter 1-3. Introduces the concepts of akrasia and utility, proceeds to model akrasia as a symptom of discount curves shaped like hyperbolas.
  2. Willpower As Preference Bundling. Based on Chapter 5. Discusses how willpower (a therapy against akrasia) comes to make our successive selves consistent with one another.  Willpower is presented as the brain subtly manipulating how it instantiates hyperbolic discount functions.
  3. Personal Rule Feedback Loops. Based on Chapter 6. Builds a mental model of preference bundling, and explores the recursive nature of personal rules.
  4. Iterated Schizophrenic’s Dilemma. Based on Chapter 6. Grounds Ainslee’s account of willpower (and preference bundling) in a modified form of Iterated Prisoner’s Dilemma.
  5. Against Willpower. Based on Chapter 9. If willpower is preference bundling, then its mechanisms become available for scrutiny. Ainslee here locates four surprising implications of his theory of willpower, which suggest that it is not the unilaterally-beneficial tool that we might suspect.

An Introduction To Hyperbolic Discounting

Part Of: [Breakdown of Will] sequence

Table Of Contents

  • What Is Akrasia?
  • Utility Curves, In 200 Words Or Less!
  • Choosing Marshmallows
  • Devil In The (Hyperbolic) Details
  • The Self As A Population
  • Takeaways

What Is Akrasia?

Do you agree or disagree with the following?

In a prosperous society, most misery is self-inflicted. We smoke, eat and drink to excess, and become addicted to drugs, gambling, credit card abuse, destructive emotional relationships, and simple procrastination, usually while attempting not to do so.

It would seem that behavior contradicting one’s own desires is, at least, a frustratingly common human experience. Aristotle called this kind of experience akrasia. Here’s the apostle Paul’s description:

I do not understand what I do. For what I want to do I do not do, but what I hate I do. (Romans 7:15)

The phenomenon of akrasia, and the entire subject of willpower generally, is controversial (a biasing attractor). Nevertheless, both its description and underlying mechanisms are empirically tractable. Let us now proceed to help Paul understand, from a cognitive perspective, the contradictions emerging from his brain.

We begin our journey with the economic concept of utility.

Utility Curves, In 200 Words Or Less!

Let utility here represent the strength with which a person desires a thing. This value may change over time. A utility curve, then, simply charts the relationship between utility and time. For example:

Hyperbolic- Utility Curve Outline

Let’s zoom in on this toy example, and name three temporal locations:

  • Let tbeginning represent the time I inform you about a future reward.
  • Let treward represent the time you receive the reward.
  • Let tmiddle represent some intermediate time, between the above.

Consider the case when NOW = tbeginning. At that time, we see that the choice is valued at 5 “utils”.

Hyperbolic- Utility Curve T_beginning

Consider what happens as the knife edge of the present (the red line) advances.  At NOW = tmiddle, the utility of the choice (the strength of our preference for it) doubles:

Hyperbolic- Utility Curve T_middle (2)

Increasing utility curves also go by the name discounted utility, which stems from a different view of the x-axis (at the decision point looking towards the past, or setting x to be in units of time delay). Discounted utility reflect something of human psychology: given a fixed reward, other things equal, receiving it more quickly is more valuable.

This concludes our extremely complicated foray into economic theory. 😛 As you’ll see, utility curves present a nice canvas on which we can paint human decision-making.

Choosing Marshmallows

Everyday instances of akrasia tend to be rather involved. Consider the decision to maintain destructive emotional relationships: the underlying causal graph is rather difficult to parse.

Let’s simplify. Ever heard of the Stanford Marshmallow Experiment?

In these studies, a child was offered a choice between one small reward (sometimes a marshmallow) provided immediately or two small rewards if he or she waited until the tester returned (after an absence of approximately 15 minutes). In follow-up studies, the researchers found that children who were able to wait longer for the preferred rewards tended to have better life outcomes, as measured by SAT scores, educational attainment, body mass index (BMI) and other life measures.

Naming the alternatives:

  • SS reward: Call the immediate, one-marshmallow option the SS (smaller-sooner) reward.
  • LL reward: Call the delayed, two-marshmallow option the LL (larger-later) reward.

Marshmallows are simply a playful vehicle to transport concepts. Why are we tempted to reach for SS despite knowing our long-term interests lie with LL?

Here’s one representation of the above experiment (LL is the orange curve, SS is green):

Hyperbolic- Utility Curve Two Option Choice

Our definition of utility was very simple: a measure of preference strength. This article’s model of choice will be equally straightforward: humans always select the choice with higher utility.

The option will people select? Always the orange curve. No matter how far the knife edge of the present advances, the utility of LL always exceeds that of SS:

Hyperbolic- Utility Curve Exponential Self (1)

Shockingly, economists like to model utility curves like these with mathematical formulas, rather than Google Drawings. These utility relationships can be produced with exponential functions; let us call them exponential discount curves.

Devil In The (Hyperbolic) Details

But the above utility curves are not the only one that could be implemented in the brain. Even if we held Utility(tbeginning) and Utility(treward) constant, the rate at which Utility(NOW) increases may vary. Consider what happens when most of the utility obtains close to reward-time (when the utility curves form a “hockey stick”):

Hyperbolic- Utility Curve Hyperbolic Choice (1)

Let us quickly ground this alternative in a mathematical formalism. A function that fits our “hockey stick” criteria is the hyperbolic function; so we will name the above a hyperbolic discount curve.

Notice that the above “overlap” is highly significant – it indicates different choices at different times:

Hyperbolic- Utility Curve Hyperbolic Selves (1)

This is the birthplace of akrasia – the cradle of “sin nature” – where SS (smaller-sooner) rewards temporarily outweigh LL (larger-later) rewards.

The Self As A Population

Consider the story of Odysseus and the sirens:

Odysseus was curious as to what the Sirens sang to him, and so, on the advice of Circe, he had all of his sailors plug their ears with beeswax and tie him to the mast. He ordered his men to leave him tied tightly to the mast, no matter how much he would beg. When he heard their beautiful song, he ordered the sailors to untie him but they bound him tighter.

With this powerful illustration of akrasia, we are tempted to view Odysseus as two separate people. Pre-siren Odysseus is intent on sailing past the sirens, but post-siren Odysseus is desperate to approach them. We even see pre-siren Odysseus restricting the freedoms of post-siren Odysseus…

How can identity be divided against itself? This becomes possible if we are, in part, the sum of our preferences. I am me because my utility for composing this article exceeds my utility attached to watching a football game.

Hyperbolic discounting provides a tool to quantify this concept of competing selvesConsider again the above image. The person you are between t1 and t2 makes choices differently than the You of all other times.

Another example, using this language of warfare between successive selves:

Looking at a day a month from now, I’d sooner feel awake and alive in the morning than stay up all night reading Wikipedia. But when that evening comes, it’s likely my preferences will reverse; the distance to the morning will be relatively greater, and so my happiness then will be discounted more strongly compared to my present enjoyment, and another groggy morning will await me. To my horror, my future self has different interests to my present self. Consider, too, the alcoholic who moves to a town in which alcohol is not sold, anticipating a change in desires and deliberately constraining their own future self.

Takeaways

  • Behavior contradicting your desires (akrasia) can be explained by appealing to the rate at which preferences diminish over time (utility discount curve).
  • A useful way of reasoning about hyperbolic discount curves is warfare between successive “yous”.

Next Up: [Willpower As Preference Bundling]

Towards Cognitive Epistemology

Intellect: By convention there is sweetness, by convention bitterness, by convention color; in reality only atoms and the void.

Senses: Foolish intellect! Do you seek to overthrow us, while it is from us that you take your evidence?

This two-line dialogue, coined in 400 BCE by Democritus, haunts our species to this day. How might we build an inference trail from our subjective experiences to objective physics? can we build an account of how to move from pure phenomenology to scientific realism? And once we arrive at scientific realism, can that say anything substantial about phenomenology, without casting doubt on its own veracity?

Crude Epistemological Loop

How might we cast the above (crude) intuition of an epistemological loop into mathematics? Could we then learn to generalize the loop in non-trivial ways? How might we defend our construction of physics against vicious circularity, against constraint poverty?

Such questions are the nightmare of epistemology, the reason why no one has yet fully answered solipsism. And yet, in my view, analytic philosophy has stalled far beyond current state of the art. How often do epistemologists discuss the merits of psychophysics? Don’t you think they should? The cognitive redemption of epistemology is happening in our lifetimes.

Most of my attention in mental architecture involves sewing together different levels of analysis (vertical integrative theorizing). But it is important to note the domain of such intertheoretic reductions. While my vertical work in mental architecture will address the concerns of the agent, environmental analyses are just as important. Only with a more universal physics, a physics not bound to the limited concerns of Earth’s biota, that we can hope to achieve a more productive degree of abstraction.

Agent-World Analysis Levels

[Sequence] Hiding Data From Ourselves

ML- human vs machine learning

This series discusses a startling, and some would say anti-democratic, idea: hiding data from ourselves may be an effective way to move faster than science.

This sequence is composed of four articles.

  1. About A Noise. Discusses the origins of noise within data.
  2. Data Partitioning: Bias vs Variance. Covers a core idea in the machine learning community: building fences around data sources protects machines from underestimating noise.
  3. Overfitting: Failure Mode of Meat Science. Surveys evidence of overfitting within human scientific communities (“meat science”). Connects to philosophy of science, and motivates the application of data partitioning into the human realm.
  4. [Planned] Reconstructing The Shoulders Of Giants. A look at what data partitioning would look like in practice, including how I am applying it on this blog.

Tunneling Into The Soup

Tunneling- Colorless Sky

Table Of Contents

  • Context
  • Bathing In Radiation
  • Our Photoreceptive Hardware
  • The Birthplace Of Human-Visibility
  • Takeaways

Context

In Knowledge: An Empirical Sketch, I left you with the following image of perceptual tunneling:

Perceptual Tunneling

Today, we will explore this idea in more depth.

Bathing In Radiation

Recall what you know about the nature of light:

momentum_wavelength_equivalence

Since h and c are just constants, the relation becomes very simple: energy is inversely proportional to wavelength. Rather than identifying a photon by its energy, then, let’s identify it by its wavelength. We will do this because wavelength is easier to measure (in my language, we have selected a measurement-affine independent variable).

So we can describe one photon by its wavelength. How about billions? In such a case, it would be useful to draw a map, on which we can locate photon distributions. Such a photon map is called an electromagnetic spectrum.

With this background knowledge in place, we can explore a photon map of solar radiation: what types of photons strike the Earth from the Sun?

Tunneling- Solar Radiation

This image is rather information-dense. Let’s break it down.

Compare the black curve to the yellow distribution. The former represents the difference between an idealized energy radiator (a black body), whereas the latter represents the actual transmission characteristics of the sun. As you can see, while the black body abstraction does not perfectly model the idiosyncrasies of our Sun, it does a pretty good job “summarizing” the energy output of our star.

Next, compare the yellow distribution to the red distribution. The former represents solar radiation before it hits our atmosphere, the latter represents solar radiation after it hits our atmosphere (when it strikes the Earth). As you can see, at some wavelengths light passes through the atmosphere easily (red ~= yellow; call this atmospheric lucidity) whereas at other wavelengths, the atmosphere absorbs most of the photons (red << yellow; call this atmospheric opacity).

These different responses to different energy light does not occur at random, of course. Rather, the chemical composition of the atmosphere causes atmospheric opacity. Ever hear the meme “the ozone layer protects us from UV light”? Well, here is that data underlying the meme (see the “O3” marker at the 300 nm mark?). Other, more powerful but less well-known, effects can be seen in the above spectrum, which characterize the shielding effects of water vapor, carbon dioxide, and oxygen onto the spectra.

Our Photoreceptive Hardware

Your eyes house two types of photoreceptive cell: the rod and the cone.

Rods are tuned towards performing well in low-light conditions. After roughly 30 minutes in the dark, everything is ready for optical stimulation. In this “dark-adapted” state, the visual system is amazingly sensitive. A single photon of light can cause a rod to trigger. You will see a flash of light if as few as seven rods absorb photons at one time.

Cones, on the other hand, excel in daylight. They also underwrite the experience of color (a phenomenon we will discuss next time).  Now, unless you are a tetrachromat mutant, your eyes contain three kinds of cone:

  • 16% blue cones
  • 10% green cones
  • 74% red cones

Below are the absorption spectra.  Please note that, while not shown, rods manifest a similar spectrum: they reside between the blue and green curves , with a peak receptivity is 498 nm.

Tunneling- Cone SpectrumIt is important to take the above in context of the cone’s broader function. By virtue of phototransductive chemical processes, sense organs like the cone accept photons matching the above spectrum as input, and thereby facilitate the production of spike trains (neural code) as system output.

The Birthplace Of Human-Visibility

We now possess two spectra: one for solar radiation, and one for photoreceptor response profile. Time to combine spectra!  After resizing the images to achieve scale consistency, we arrive at the following:

Tunneling- Solar Radiation With Tunneling

Peak solar radiation corresponds to cone spectrum! Why should this be? Well, recall the purpose of vision in the first place. Vision is an evolutionary adaptation that extracts information from the environment & makes it available to the nervous system of its host. If most photon-mediated information is happening at the 450-700 nm energy level, should we really be so surprised to learn that our eyes have adapted to this particular range?

Notice that we drew two dotted lines around the intersection boundaries. We have now earned the right to use a name. Let us name photons that reside within the above interval, visible light. Then,

  • Let “ultraviolet light” represent photons to the left of the interval (smaller wavelengths, higher energy)
  • Let “infrared light” represent photons to the right of the interval (longer wavelengths, lower energy)

We have thus stumbled on the definition of visibility. Visibility is not an intrinsic physical property, like charge. Rather, it is human invention: the boundary at which our idiosyncratic photoreceptors carve into the larger particle soup.

Takeaways

We have now familiarized ourselves with the mechanism of tunneling. Perceptual tunneling occurs when sense organ transduces some slice of the particle soup of reality. In vision, photoreceptors cells transduce photons within the 450-700 nm energy band into the neural code.

With this precise understanding of transduction, we begin to develop a sense of radical contingency. For example,

  • Ever wonder what would happen if the human eye also contained photoreceptors on the 1100 nm range?  The human umwelt As you heat a fire up, for example, you would see tendrils of flame brighten, then vanish, then reappear. I suspect everyday language would feature “bright-visible” and “dim-visible”
  • Consider what would have happened if, during our evolution, the solar radiation spectrum if the sun had been colder than 5250 degrees Celsius. The black-body idealized spectrum of the sun would shift, and its peak would move towards the right. The actual radiation signature of the sun (yellow distribution) would follow. Given how precisely the rods in our eyes “found” the band of peak emitted energy in this universe, in that world, it seems likely that we would be wearing different photoreceptors with an absorption signature better calibrated to the  the new information. Thus, we have a causal link between the temperature of the Sun and the composition of our eyeballs.

I began this post with a partial quote from Metzinger. Here is the complete quote:

The evening sky is colorless. The world is not inhabited by colored objects at all. It is just as your physics teacher in high school told you: Out there, in front of your eyes, there is just an ocean of electromagnetic radiation, a wild and raging mixture of different wavelengths. Most of them are invisible to you and can never become part of your conscious model of reality. What is really happening is that your brain is drilling a tunnel through this inconceivably rich physical environment and in the process painting the tunnel walls in various shades of color. Phenomenal color. Appearance.

Next time, we’ll explore the other half of Metzinger’s quote: “painting the tunnel walls in various shades of color, phenomenal color”…

Overfitting: Failure Mode of Meat Science

Followup To: [Data Partitioning: How To Repair Explanation]

If I have seen a little further, it is by standing on the shoulders of giants
– Isaac Newton

Table Of Contents

  • Context
  • The Optimization Level Of Science
  • The Fruits Of Science
  • Translation Proxy Lag
  • Cheap Explanations
  • Failure Mode of Meat Science
  • Takeaways

Context

Last time, we learned two things:

  1. Explainers who solely optimize against prediction error are in a state of sin.
  2. Data partitions immunize abductive processes against overfitting.

Today, we will apply these results to human scientific processes, or to what I will affectionately call meat science.

The Optimization Level Of Science

Nietzsche captures the reputation of science well:

Science is flourishing today and her good conscience is written all over her face, while the level to which all modern philosophy has gradually sunk… philosophy today, invites mistrust and displeasure, if not mockery and pity. It is reduced to “theory of knowledge”… how could such a philosophy dominate? … The scope and the tower-building of the sciences has grown to be enormous, and with this also the probability that the philosopher grows weary while still learning….

Attempts to ground such sentiments in something rigorous exceeds the scope of this post. Today, we simply accept that science is particularly epistemically productive. But let us move beyond the cheerleading, and ask ourselves why this is so.

If I were to hand over a map of our specie’s cognitive architecture to an alien species, I would expect them to predict demagogues much more easily than the discovery of the Higg’s Boson. The simple truth is that our minds are flawed: we are born with clumsy inference machinery. How then is epistemic productivity possible?

The success of science has been said to derive from the scientific method:

Overfitting- Scientific Method

But does the scientific method lend itself to the debiasing of the human animal? I argue it does not. Tribalism in scientific communities, for example, doesn’t seem particularly muted compared to other realms of human experience. Further, in his classic text The Structure of Scientific Revolutions, Thomas Kuhn showed that scientific revolutions emerge from strong, extra-rational motives. In his view, the nature of paradigm shifts is a bit like mystical religious experience: deeply personal, and difficult to verbalize.

I like to imagine science as a socio-historical process. Individuals and even sub-communities within its disciplines may fail to track what is Really There, but communities on the whole tend to move towards this direction. As Max Planck once observed:

A scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it.

The question of how the scientific method facilitates socio-historical truth-tracking is, I believe, unsolved. But all I must do today, is flag the optimization level of science. If science organically debiases our species at the socio-historical level, there is much room for improvement. If cognitive science can forge new debiasing weapons, we will become increasingly able to transcend ourselves, able to move faster than science.

The Fruits Of Science

Consider again the following:

Overfitting- Newton Seen Further Quote

This meme is sticky, and signals humility… but it also frames a crisis.

The shoulders keep growing taller.

Because of the accretive nature of science, our collective knowledge far outpaces our cognitive abilities. Even if the controversial Flynn effect is true and our collective IQ really is improving over time, the size of our databases would still outstrip the reach of our cognitive cone.

Various technologies have been invented that assuage this crisis. Curriculum compression is one of the older ones. That which was originally available as an anthology of journal articles is compressed into a review article, review articles compressed to graduate courses, graduate courses polished into undergraduate presentations. Consider how Newton would marvel at a present-day course in differential calculus: 300 years of mathematical research, successfully installed into an undergraduate’s semantic memory in a matter of weeks.

Curriculum compression is but one answer to our exploding knowledge base. Other implicit reactions include:

  1. the narrowing of research trajectories
  2. the fractionating of disciplines into the Ivory Achipellago
  3. research crowdsourcing (the de-popularization of the Lone Intellectual Warrior model)

In the years ahead, society seems geared to add two more solutions to our “bag of tricks”:

  1. the cognitive reform of education technology
  2. the mechanization of science

And yet, the shoulders keep growing taller…

Translation Proxy Lag

In Knowledge, an Empirical Sketch, I introduced measurement under the title of translation proxies. Why the strange name? To underscore the nature of measurement: forcibly relocating sensory phenomena from its “natural habitat” towards a signature that our bodies are equipped to sense. In this light, a translation proxy can be viewed as a kind of faucet, bringing novel forms of physical reality into the human umwelt.

But consider physics, where the “low-hanging fruit” translation proxies have already been built. You don’t see many physicists clamoring for new hand-held telescopes. Instead, you see them excited over ultra-high precision telescope mirrors, over particle colliders producing energies in the trillions of electronvolts. Such elaborate proxy technologies simply do not flow as quickly from the furnace of invention. Call this phenomenon data intake stagnation.

Not only are our proxy innovations becoming more infrequent, but they are also severely outpaced by our theoreticians. In physics, M theory (the generalization of string theory) posits entities at the 10^-34 m scale, but our current microscopes can only interrogate resolutions around the 10^-15 m scale. In neuroscience, connectome research programs seek to graph the nervous system at the neuronal level (10^-6 m), but most imaging technologies only support the millimeter (10^-3) range. Call this phenomenon translation proxy lag.

It will be decades, perhaps centuries, before our measurement technologies catch up.

Cheap Explanations

Let us bookmark translation proxy lag, and consider a different sort of problem.

Giant shoulders are not merely growing taller. They also render impotent evidence once vital to theoreticians. Let me appeal to a much-cited page from history, to illustrate.

During World War I, Sir Arthur Eddington was Secretary of the Royal Astronomical Society, which meant he was the first to receive a series of letters and papers from Willem de Sitter regarding Einstein’s theory of general relativity. […] He quickly became the chief supporter and expositor of relativity in Britain. […]

After the war, Eddington travelled to the island of Príncipe near Africa to watch the solar eclipse of 29 May 1919. During the eclipse, he took pictures of the stars in the region around the Sun. According to the theory of general relativity, stars with light rays that passed near the Sun would appear to have been slightly shifted because their light had been curved by its gravitational field. This effect is noticeable only during eclipses, since otherwise the Sun’s brightness obscures the affected stars. Eddington showed that Newtonian gravitation could be interpreted to predict half the shift predicted by Einstein.

Eddington’s observations published the next year confirmed Einstein’s theory, and were hailed at the time as a conclusive proof of general relativity over the Newtonian model. The news was reported in newspapers all over the world as a major story:

Overfitting- NYT 1919

Two competing theories, two perfectly adequate explanations for everyday phenomena, one test to differentiate models. Here we see a curiosity: scientists place a high value on new data. Gravitational lensing constituted powerful confirmation because, as far as the model-creators knew, it could have been the other way.

Overfitting- Einstein vs. Newton

A nice, clean narrative. But consider what happens next. General relativity has begun to show its age: it is chronically incompatible with quantum mechanics. Many successors to general relativity have been created; let us call them Quantum Gravity Theory A, B, and C. Frustratingly, no discrepancies have been found between these theory-universes and our observed-universe.

Overfitting- Ad-Hoc Theories

How are multiple correct options possible? From a computational perspective, the phenomenon of multiple correct answers can be modeled with Solomonoff Induction. A less precise, philosophical precursor of the same morale can be found in underdetermination of theory.

But which theory wins? General relativity won via a solar eclipse generating evidence for gravitational lensing. But gravitational lensing is now “old hat”; so long as all theories accommodate its existence, it no longer wields theory-discriminating power. And – given the translation proxy lag crisis outlined above – it may take some time before our generation acquires its test, its analogue of a solar eclipse.

Can anything be done in the meantime? Is science capable of discriminating between QGT/A, QGT/B, and QGT/C in the absence of a clean novel prediction? When we choose to invest more of our lives in one particular theory above the others, are we doomed to make this choice by mere dint of our aesthetic cognitive modules, by a sense of social belonging, by noise in our yedasentiential signals?

Overfitting: Failure Mode of Meat Science

Bayesian inference teaches us that confidence is usefully modeled as a probabilistic thing. But stochastisity is not equiprobability: discriminability is a virtue. If science cannot provide it, let us cast about for ways to reform science.

Let us begin our search by considering the rhetoric of explanation. What does it mean for a hypothesis to be criticized as ad-hoc?

> Scientists are often skeptical of theories that rely on frequent, unsupported adjustments to sustain them. This is because, if a theorist so chooses, there is no limit to the number of ad hoc hypotheses that they could add. Thus the theory becomes more and more complex, but is never falsified. This is often at a cost to the theory’s predictive power, however. Ad hoc hypotheses are often characteristic of pseudoscientific subjects.

Do you recall how we motivated data partitioning? Do you yet recognize the stench of overfitting?

In fact, in my view, current scientific practice is a bit too uncomfortable with falsification. Karl Popper lionized falsifiability, and the result of his movement has been an increase in the operationalization and measurement-affinity of the scientific grammar. But the weaving together scientific abstractions and the particle soup, the dawning taboo around Not Even Wrong, came with baggage.

Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive: it shows the theory, from which the implication is derived, to be false.

But compare this dictum with our result from machine learning, which suggests that perhaps small “falsifications” may be preferable to “getting everything right”:

Explainers who solely optimize against prediction error are in a state of sin.

In sum, we have reason to believe that overfitting is the pervasive illness of our meat science.

Takeaways

  • Science tracks truth not at the level of individual, but on a socio-historical scale. There is therefore room to move faster than science.
  • Science is accretive: the shoulders keep growing taller.
  • Due to constraints in measurement technology, data of a truly novel character has become increasingly difficult to acquire.
  • On meat science, as data transitions from “novel” to “known”, it loses its ability to wield theory-discriminating power.
  • Given these dual crises, and the philosophical commitments scientific communities have made against falsified hypotheses, science can be shown to suffer from overfitting.

Next time, we will explore applying the machine learning solution to overfitting – data partitioning – to meat science, and motivate the virtue of hiding data from ourselves. See you then!