[Video] An introduction to reinforcement learning

Part Of: Reinforcement Learning sequence

Sorry it’s been so long since my last post!  I’ve been teaching a Deep Learning class, based on Andrew Ng’s Coursera specialization.  Don’t worry, my other lectures will ultimately be cleaned & shared here too 🙂

This talk covers the mathematical intuitions of RL, which draws from content relating to Markov Chains and Markov Decision Processes. It also contains some novel material, including my thoughts on how RL compares with other machine learning techniques.



Two Cybernetic Loops

Part Of: Neuroanatomy sequence
Content Summary: 800 words, 8 min read

What Is Perception About?

Consider Aristotle’s five senses: vision, hearing, smell, touch, and taste. We know that senses are windows into physical reality. But what aspects of reality do these represent?

Vision and hearing have a special property: despite receptors being located within the body (proximal), they carry information about phenomena outside of the body (distal). They carry information about the world. In contrast, smell, touch, and taste only represent events close to the body; these encode the interaction between body and world.

This distinction is a neural primitive: the brain encodes World and Interaction in extrapersonal and peripersonal space, respectively.

However, there is a significant lacuna within this binary system: none of these concern the body. Body sensation is a crucial “sixth sense”:


Making Sense of Anatomy

We spend a lot of time discussing the nervous system. But the body houses eight other anatomical systems: reproductive, integumentary (skin), muscular, skeletal, endocrine (hormones), digestive (incl. urinary and excretory subsystems), circulatory (incl. immune and lymphatic subsystems), and respiratory.

To regulate these systems, your brain recruits the following peripheral nervous systems:

  1. Somatic, which contains spinal nerves and cranial nerves
  2. Autonomic, incl. the sympathetic “fight/flight” and parasympathetic “rest/ digest” 
  3. Neuroendocrine, incl. the HPA, HPG, HPT, and Neurohypophyseal axes
  4. Enteric, also called the “second brain”, a large mass of digestion-oriented neurons
  5. Neuroenteric, connects enteric nervous system via microbiome-gut-brain axis
  6. Neuroimmune, recently discovered, primarily mediated by glial cells
  7. Glymphatic, recently discovered, which removes metabolites via CSF during sleep
  8. Neurogaseous, recently discovered, mediated by gasotransmission

The CNS must coordinate all of these to respond to sense data and regulate anatomical systems. A complex undertaking. How might we understand such a process?

With the above trichotomy { world, interaction, body }, anatomical and sensory systems can be organized into meaningful categories:


The Interlocking Loop Hypothesis posits the existence of two perception-action loops, inhabiting a gradient of abstraction:

  1. The somatic “cold” loop, world- and interaction-oriented, from exteroception to movement.
  2. The visceral “hot” loop, body-oriented, from interoception to body regulation.

Loops As Organizing Principle

Evidence for the Interlocking Loop Hypothesis comes from two anatomical principles of organisation:

First, the Bell Magendie Law is based on the observation that, in all chordates, sensory information is processed at the back of the brain, and behavioral processes are at the front (“posterior perception, anterior action”):

Cybernetics- Posterior Perception, Anterior Action

Second, the Medial Viscera Principle is the observation that visceral processes tend to reside in the center of the brain (medial regions):


Thus we can see our loops clustering at different levels of the abstraction hierarchy.

We can also see our loops’ primary site of convergence:

Anatomically, the two loops converge on the basal ganglia, in which both somatic and visceral processes are blended to yield coherent behavior.


The above quote & image are from Panksepp (1998), Affective Neuroscience.

The Basis of Motivation

Why should our two loops converge on the basal ganglia? The basal ganglia is the substrate of motivation, or “wanting”. It also participates in reinforcement learning, and its mathematical interpretation as Markov Decision Processes (MDPs).

Historically, the reward function in MDPs has proven difficult to interpret biologically; however, this task becomes straightforward on the Interlocking Loop Hypothesis. Of course the cold loop would tune its behavior to promote the hot loop’s efforts to keep the organism alive.


The Basis of Consciousness

In Can Consciousness Be Explained?, I wrote:

Let me put forward a metaphor. Consciousness feels like the movies. More specifically, it comprises:

  1. The Mental Movie. What is the content of the movie? It includes data captured by your eyes, ears, and other senses.
  2. The Mental Subject. Who watches the movie? Only one person, with your goals and your memories – you!

On this view, to explain consciousness one must explain the origins, mechanics, and output of both Movie and Subject. (Of course, one must be careful that the Subject is not a homunculus, on pain of recursion!)

The Interlocking Loop hypothesis offers an obvious foothold in the science of consciousness:

  • The world-centric cold loop generates the Mental Movie (“a world appears”). 
  • The body-centric hot loop creates the Subject (“narrative center of gravity”)

Thus, we are no longer surprised that opioid anomalies (a visceral loop instrument) are linked to depersonalization disorders; whereas dopamine (the promoter of somatic behavior) is associated with subjective time dilation effects.


First, we introduced the Interlocking Loop Hypothesis:

  • Some perceptions are about the world, others are about the body.
  • The CNS is a visceral body-centric hot loop, and a somatic world-centric cold loop
  • Bell-Magendie Law: perception for both loops is posterior, action is anterior.
  • Medial Viscera Principle: hot loop is located medially, while cold loop is more lateral.

Then, we examined its implications:

  • Motivation, as generated by the basal ganglia, is loop communication software; it allows the hot loop to influence cold loop behavior.
  • Consciousness has two components: the Mental Movie and Mental Subject. These are supported by cold and hot loops, respectively.

Until next time.

Relevant Materials

  • Northoff & Panksepp (2008). The trans-species concept of self and the subcortical–cortical midline system