Iterated Schizophrenic’s Dilemma

Part Of: Breakdown of Will sequence
Followup To: Rule Feedback Loops
Content Summary: 1300 words, 13 min read

Context

Here’s where we landed last time:

  • Preference bundling is mentally implemented via a database of personal rules (“I will do X in situations that involve Y).
  • Personal rules constitute a feedback loop, whereby rule-compliance strengthen (and rule-circumvention weakens) the circuit.

Today’s post will draw from the following:

  1. An Introduction To Hyperbolic Discounting discusses warfare between successive selves (e.g., OdysseusNOW restricts the freedoms of OdysseusFUTURE in order to survive the sirens).
  2. An Introduction To Prisoner’s Dilemma discusses how maths can be used to describe the outcome of games between competitors.

Time to connect these threads! 🙂 Let’s ground our warfare narrative in the formal systems of game theory.

Schizophrenic’s Dilemma

First, we interpret { larger-later (LL) vs. smaller-sooner (SS) } rewards in terms of { cooperation vs. defection } decisions.

Second, we name our actors:

  • P represents your present-self
  • F represents your future-self
  • FF represents your far-future-self, etc.

Does this game suffer from the same issue as classical PD? Only one way to find out!

ISD- Simple Example (1)

Here’s how I would explain what’s going on:

  • Bold arrows represent decisions, irreversible choices in the real world. Decisions lock in final scores (yellow boxes).
  • Skinny arrows represent intentions, the construction of rules for future decisions.
  • The psychological reward you get from setting intentions is just as real as that produced by decisions.
  • Reward is accretive: your brain seeks to maximize the sum total reward it receives over time.

Take some time to really ingest this diagram.

Iterated Schizophrenic’s Dilemma (ISD)

Okay, so we now understand how to model intertemporal warfare as a kind of “Schizophrenic’s Dilemma“. But we “have to live with ourselves” — the analogy ought to be extended across more than one decision. Here’s how:

ISD - From Flat Behavior To IPD

Time flows from left to right. In the first row, we see that an organism’s choices are essentially linear. For each choice, the reasoner selects either an LL (larger-longer) or an SS (smaller-sooner) reward. How can this be made into a two-dimensional game? We can achieve this graphically by “stretching each choice point up and to the left”. This move ultimately implements the following:

  • Temporal Continuity: “the future-self of one time step must be equivalent to the present-self of the next time step”.

It is by virtue of this rule that we are able to conceptualize the present-self competing against the future-self (second row). The third row simply compresses the second into one state-space.

Let us call this version of the Iterated Prisoner’s Dilemma, the Iterated Schizophrenic’s Dilemma. For the remainder of this article, I will use IPD and ISD to distinguish between the two.

The Lens Of State-Space

We have previously considered decision-space, and outcome-space. Now we shall add a third space, what I shall simply call state-space (very similar to Markov Decision Processes…)

ISD- Three IPD Spaces (1)

You’ll notice that the above state-space is a fully-connected graph: in IPD, any decision can be made at any time.

But this is not true in ISD, which must honor the rule of Temporal Continuity. For example, (C, C) -> (D, C) violates that “the future-self of one time step must be equivalent to the present-self of the next time step”.  The ISD State-Space results from trimming all transitions that violate Temporal Continuity:

ISD- ISD vs IPD state spaces

Let me briefly direct your attention to three interesting facts:

  1. Half of the edges are gone. This is because information now flows in only one direction.
  2. Every node is reachable; no node has been stranded by the edge pruning.
  3. (C,D) and (D,C) are necessarily transient states; only (C, C) and (D,D) can recur indefinitely.

Intertemporal Retaliation

Four qualities of successful IPD strategies are: nice, forgiving, non-envious, and retaliating. How does retaliation work in the “normal” IPD?

ISD - Traditional IPD retaliation

Here we have two cases of retaliation:

  1. At the first time-step, Player A defects “unfairly”. Incensed, Player B retaliates (defects) next turn.
  2. At the fourth time-step, Player B takes advantage of A’s goodwill. Outraged, Player A punishes B next turn.

Note the easy two-way symmetry. But in the ISD case, the question of retaliation becomes very complex:

  1. Present-selves may “punish” future-selves by taking an earlier reward, now.
  2. But future-selves cannot punish present-selves, obviously, because time does not flow in that direction.

What, then, is to motivate our present-selves to “keep the faith”?  To answer this, we need only appeal to the feedback nature of personal rules (explored last time)!

I’ll let Ainslie explain:

As Bratman has correctly argued (Bratman 1999, pp. 35-57), a present “person-stage” can’t retaliate against the defection of a prior one, a difference that disqualifies the prisoner’s dilemma in its classical form as a rationale for consistency. However, insofar as a failure to cooperate will induce future failures, a current decision-maker contemplating defection faces a danger of the same kind as retaliation…

With the question of retaliation repaired, the analogy between IPD and ISD seems secure. Ainslie has earned the right to invoke IPD explananda; for example:

The rules of this market are the internal equivalent of “self-enforcing contracts” made by traders who will be dealing with each other repeatedly, contracts that let them do business on the strength of handshakes (Klein & Leffler 1981; Macaulay 1963).

Survival Of The Salient

After casting warfare of successive selves into game theoretic terms, we are in a position to import other concepts. Consider the Schelling point, the notion that salient choices function as an attractor between competitors. Here’s an example of the Schelling point:

Consider a simple example: two people unable to communicate with each other are each shown a panel of four squares and asked to select one; if and only if they both select the same one, they will each receive a prize.Three of the squares are blue and one is red. Assuming they each know nothing about the other player, but that they each do want to win the prize, then they will, reasonably, both choose the red square. Of course, thered square is not in a sense a better square; they could win by both choosing any square. And it is only the “right” square to select if a player can be sure that the other player has selected it; but by hypothesis neither can. However, it is the most salient and notable square, so—lacking any other one—most people will choose it, and this will in fact (often) work.

By virtue of the feedback mechanism discussed above, rules are adapt over time via a kind of variation on natural selection (“survival of the salient“):

Intertemporal cooperation is most threatened by rationalizations that permit exceptions for the choice at hand, and is most stabilized by finding bright lines to serve as criteria for what constitutes cooperation. A personal rule never to drink alcohol, for instance, is more stable than a rule to have only two drinks a day, because the line between some drinking and no drinking is unique (bright), while the two-drinks rule does not stand out from some other number, or define the size of the drinks, and is thus susceptible to reformulation. However, skill at intertemporal bargaining will let you attain more flexibility by using lines that are less bright. This skill is apt to be a key component of the control processes that get called ego functions.

In the vocabulary of our model, it is the peculiarities of the preference bundling module whereby Schelling points are more effectual.

Let us close by, in passing, noting that this line of argument can be generalized into a justification for slippery slope arguments.

Takeaways

  • Warfare of successive selves can be understood in terms of the Prisoner’s Dilemma, where cooperation with oneself is selecting LL, and defection is SS.
  • In fact, intertemporal bargaining can be successfully explained via a modified form of the Iterated Prisoner’s Dilemma, since retaliation works via a unique feedback mechanism.
  • Because our model of willpower spans both a game-theoretic and cognitive grammar, we can make sense of a “survival of the salient” effect, whereby memorable rules persist longer.
Advertisements

An Introduction To Hyperbolic Discounting

Part Of: [Breakdown of Will] sequence

Table Of Contents

  • What Is Akrasia?
  • Utility Curves, In 200 Words Or Less!
  • Choosing Marshmallows
  • Devil In The (Hyperbolic) Details
  • The Self As A Population
  • Takeaways

What Is Akrasia?

Do you agree or disagree with the following?

In a prosperous society, most misery is self-inflicted. We smoke, eat and drink to excess, and become addicted to drugs, gambling, credit card abuse, destructive emotional relationships, and simple procrastination, usually while attempting not to do so.

It would seem that behavior contradicting one’s own desires is, at least, a frustratingly common human experience. Aristotle called this kind of experience akrasia. Here’s the apostle Paul’s description:

I do not understand what I do. For what I want to do I do not do, but what I hate I do. (Romans 7:15)

The phenomenon of akrasia, and the entire subject of willpower generally, is controversial (a biasing attractor). Nevertheless, both its description and underlying mechanisms are empirically tractable. Let us now proceed to help Paul understand, from a cognitive perspective, the contradictions emerging from his brain.

We begin our journey with the economic concept of utility.

Utility Curves, In 200 Words Or Less!

Let utility here represent the strength with which a person desires a thing. This value may change over time. A utility curve, then, simply charts the relationship between utility and time. For example:

Hyperbolic- Utility Curve Outline

Let’s zoom in on this toy example, and name three temporal locations:

  • Let tbeginning represent the time I inform you about a future reward.
  • Let treward represent the time you receive the reward.
  • Let tmiddle represent some intermediate time, between the above.

Consider the case when NOW = tbeginning. At that time, we see that the choice is valued at 5 “utils”.

Hyperbolic- Utility Curve T_beginning

Consider what happens as the knife edge of the present (the red line) advances.  At NOW = tmiddle, the utility of the choice (the strength of our preference for it) doubles:

Hyperbolic- Utility Curve T_middle (2)

Increasing utility curves also go by the name discounted utility, which stems from a different view of the x-axis (at the decision point looking towards the past, or setting x to be in units of time delay). Discounted utility reflect something of human psychology: given a fixed reward, other things equal, receiving it more quickly is more valuable.

This concludes our extremely complicated foray into economic theory. 😛 As you’ll see, utility curves present a nice canvas on which we can paint human decision-making.

Choosing Marshmallows

Everyday instances of akrasia tend to be rather involved. Consider the decision to maintain destructive emotional relationships: the underlying causal graph is rather difficult to parse.

Let’s simplify. Ever heard of the Stanford Marshmallow Experiment?

In these studies, a child was offered a choice between one small reward (sometimes a marshmallow) provided immediately or two small rewards if he or she waited until the tester returned (after an absence of approximately 15 minutes). In follow-up studies, the researchers found that children who were able to wait longer for the preferred rewards tended to have better life outcomes, as measured by SAT scores, educational attainment, body mass index (BMI) and other life measures.

Naming the alternatives:

  • SS reward: Call the immediate, one-marshmallow option the SS (smaller-sooner) reward.
  • LL reward: Call the delayed, two-marshmallow option the LL (larger-later) reward.

Marshmallows are simply a playful vehicle to transport concepts. Why are we tempted to reach for SS despite knowing our long-term interests lie with LL?

Here’s one representation of the above experiment (LL is the orange curve, SS is green):

Hyperbolic- Utility Curve Two Option Choice

Our definition of utility was very simple: a measure of preference strength. This article’s model of choice will be equally straightforward: humans always select the choice with higher utility.

The option will people select? Always the orange curve. No matter how far the knife edge of the present advances, the utility of LL always exceeds that of SS:

Hyperbolic- Utility Curve Exponential Self (1)

Shockingly, economists like to model utility curves like these with mathematical formulas, rather than Google Drawings. These utility relationships can be produced with exponential functions; let us call them exponential discount curves.

Devil In The (Hyperbolic) Details

But the above utility curves are not the only one that could be implemented in the brain. Even if we held Utility(tbeginning) and Utility(treward) constant, the rate at which Utility(NOW) increases may vary. Consider what happens when most of the utility obtains close to reward-time (when the utility curves form a “hockey stick”):

Hyperbolic- Utility Curve Hyperbolic Choice (1)

Let us quickly ground this alternative in a mathematical formalism. A function that fits our “hockey stick” criteria is the hyperbolic function; so we will name the above a hyperbolic discount curve.

Notice that the above “overlap” is highly significant – it indicates different choices at different times:

Hyperbolic- Utility Curve Hyperbolic Selves (1)

This is the birthplace of akrasia – the cradle of “sin nature” – where SS (smaller-sooner) rewards temporarily outweigh LL (larger-later) rewards.

The Self As A Population

Consider the story of Odysseus and the sirens:

Odysseus was curious as to what the Sirens sang to him, and so, on the advice of Circe, he had all of his sailors plug their ears with beeswax and tie him to the mast. He ordered his men to leave him tied tightly to the mast, no matter how much he would beg. When he heard their beautiful song, he ordered the sailors to untie him but they bound him tighter.

With this powerful illustration of akrasia, we are tempted to view Odysseus as two separate people. Pre-siren Odysseus is intent on sailing past the sirens, but post-siren Odysseus is desperate to approach them. We even see pre-siren Odysseus restricting the freedoms of post-siren Odysseus…

How can identity be divided against itself? This becomes possible if we are, in part, the sum of our preferences. I am me because my utility for composing this article exceeds my utility attached to watching a football game.

Hyperbolic discounting provides a tool to quantify this concept of competing selvesConsider again the above image. The person you are between t1 and t2 makes choices differently than the You of all other times.

Another example, using this language of warfare between successive selves:

Looking at a day a month from now, I’d sooner feel awake and alive in the morning than stay up all night reading Wikipedia. But when that evening comes, it’s likely my preferences will reverse; the distance to the morning will be relatively greater, and so my happiness then will be discounted more strongly compared to my present enjoyment, and another groggy morning will await me. To my horror, my future self has different interests to my present self. Consider, too, the alcoholic who moves to a town in which alcohol is not sold, anticipating a change in desires and deliberately constraining their own future self.

Takeaways

  • Behavior contradicting your desires (akrasia) can be explained by appealing to the rate at which preferences diminish over time (utility discount curve).
  • A useful way of reasoning about hyperbolic discount curves is warfare between successive “yous”.

Next Up: [Willpower As Preference Bundling]