Murderous Cliques

October 4, 2014October 7, 2014kevinbinz Leave a comment

Part Of: [Advanced Inference In Graphical Models] sequence

Table Of Contents

Cliquish Nodes
The Mechanics Of Variable Elimination
A Question Of Order
Enter Graphical Models
Murderous Cliques
Takeaways

Cliquish Nodes

Fully-connected networks have some interesting properties. Let’s create a vocabulary to reason about them in more nuanced graphs.

Let clique be a subset of nodes in a graph that are fully connected. Call the set of all cliques C.
Let maximal clique be a clique that cannot remain a clique by including any one more adjacent vertex.
Let maximum clique be a clique C_mm for which |C_mm| = max(|C_x|) for each C_x in C.

An example may help, you say?!

Our three sets in action:

Cliques: {{1}, {2}, {3}, {4}, {5}, {1, 2}, {2, 3}, {1, 3}, {2, 4}, {3, 4}, {2, 5}, {1, 2, 3}, { 2, 3, 4 }}
Maximal Cliques: {{1, 2, 3}, {2, 3, 4}, {2, 5}}
Maximum Cliques: {{1, 2, 3}, {2, 3, 4}}

The Mechanics Of Variable Elimination

The basic point of variable elimination is to push summation signs as far to the right as possible, to reduce computational complexity (you no longer have to ask factors not involving x_i, to sum over it).

Let us call this the factor grouping step:

See how we pulled x_i summation to the right?

Once we have a single summation over only the relevant factors, we proceed. The resultant aggregation will no longer contain X_i; therefore, let us call this operation variable elimination step.

To be clear, the above notation is a psi_{1, 2} – but the one is scratched out, to represent its elimination.

Putting these two steps together, we can now understand the process of eliminate a variable from our model. Consider the task of eliminating x₁ from the following model:

The above image captures the process for eliminating one variable. To find p(x₃, x₄), we must eliminate two more. Imagine we choose X₅ to eliminate next, and then X₂. The resulting equation will bear our answer, the value of p(x₃, x₄).

A Question Of Order

In the above example, we chose the following variable elimination ordering: (1, 5, 2). But other orderings are possible. How does (1, 5, 2) compare to (2, 1, 5) or (5, 2, 1)?

How might we rank the desirability of different variable elimination orderings? An attractive metric is computational complexity: we should choose the ordering that minimizes the amount of required CPU muscle.

Enter Graphical Models

Graphical models provides us a way to visually inspect what used to be solely manipulations of formulae. Here is the exact same computation as above, but with pictures:

The key insight: the reconstituted graphical model (GM) is equivalent to the original GM.

HOWEVER! Such equivalence is not guaranteed! Consider what happens when we eliminate X₄ first:

Notice how the reconstituted GM diverges from the original GM: a new edge has been added between x3 and x5. Let us call this a fill-in edge.

Fill-in edges are bad news. The edges of the original graph constitute its family: when you add edges, the boundaries of the family expands, which hurts inference processes. Recasting our lens to asymptotic analysis, fill-in edges represent an added computational burden – tables whose dimensionality we would rather not compute over.

Why does choosing to first eliminate X₄ create a fill-in edge? To understand these, we’ll need to apply deep mathematics to analyze relationships between different families F(G, R).

Murderous Cliques

Let us use MRF-clique semantic interchangeability to look at fill-in edges with the lens of clique semantics. Consider again the fill-in edge phenomenon encountered above:

In the original graph (left), the maximal cliques are {{1, 2}, {2, 3}, {3, 4}, {4, 5}}. The key insight is as follows:

Existence of f₁(x_A, x_B) suggests that G[A ∪ B] should be a clique, and existence of f₂(x_B, x_C) suggests G[B ∪ C] should be a clique.
After summation, existence of f(x_A, x_C) suggests that G[A ∪ C] should also be a clique (if it is not already).

In other words:

After v is eliminated, all of its former neighbors must form a clique. If they do not, undesirable fill-in edges will be added to our graph, and the family will expand.

We can explain the x₄ fill-in edge because its neighbors (x₃ and x₅) were not already a clique.

Thus, variable elimination ordering becomes a question of analyzing the cliques of a graph. Let us call this perspective murderous cliques for short! 🙂

Takeaways

Cliques are graph subsets that are fully connected.
Variable elimination is a computational process that first bundles like factors together, and then eliminates a variable via “summing it out”.
The order in which one eliminates irrelevant variables is computationally significant.
Sometimes, variable elimination creates undesirable fill-in edges – edges that breach the family contract & expands the boundaries of the graphical model.
Deep parallels exist between clique semantics and traditional MRF factoralization.
Fill-in edges happen because the former neighbors of a fill-in edge must form a clique.
In conclusion, when selecting the order in which to eliminate variables, if possible, choose the variable whose neighbors already form a clique.

Family vs. Universe

October 4, 2014October 4, 2014kevinbinz 1 Comment

Part Of: [Advanced Inference In Graphical Models] sequence

Table Of Contents

Definitions Make The World Go ‘Round
Searching The Universe
The Topography Of Truth
Redemption Via Solution-Space Constraint

Definitions Make The World Go ‘Round

Agents within the particle soup of the universe may sample its contents.

Let the random variables X₁, …, X_N represent such samples of reality. (I discuss random variables in more depth here.)
Let the joint probability p(X₁, …, X_N) represent our belief about the whole system.
Let the universe U represent all possible p(X₁, …, X_N) distributions.

Searching The Universe

How big is U? Well, let’s imagine a simple case, where N = 3, and the domain of every random variable is {true, false}.

Two things to notice in the above joint probability table:

We use the X₁ to refer to the variable, and x₁ to refer to its value.
The summation of all p(x₁, x₂, x₃) rows equals 1.0 (translation: “there is a 100% chance that all variables will have some value”).

The Topography Of Truth

But what happens when N is larger than 3? For any N, the number of rows we must use to provide p(X₁, …, X_N) is 2^N. But exponential growth is not computationally tractable. Intractability means that any computational system – brain or CPU – cannot hope to traverse U efficiently.

The above line of reasoning is called asymptotic analysis. But there exists another way to reason about tractability.

Imagine there exists a distribution p_true(x₁, …, x_N) that our search algorithm tries to locate within U. But p_true(x₁, …, x_N) is just one distribution out of many. We can represent the situation with a solution landscape:

The x and y dimensions (width, breadth) encode locations of different probability distributions
The z dimension (height) encode some metric of distribution accuracy.

In our solution landscape, p_true(x₁, …, x_N) is simply the highest peak in the landscape, and our search algorithms’ task is simply to locate it. We can call this process gradient ascent.

Tractable landscapes tend to be convex, with no local minima; therefore, gradient ascent will lead to the p_true(x₁, …, x_N) peak. Intractable landscapes tend to be, well… more chaotic.

In such unforgiving terrain, it becomes clear why searching U, even via sophisticated approximation algorithms, tends to yield inadequate solutions.

Redemption Via Solution-Space Constraint

Let a graph comprise a set of vertices V, and a set of edges E. A graphical model, then, is the conjunction of a graph with a set of rules R. More formally:

G = (E, V)
GM = (G, R) = ((E, V), R)

What type of rules comprise R? I discuss cognitive versions of such rules here.

R are a set of rules that map from graph G to statements about probability distributions. An individual rule r is a predicate (truth statement) that takes a graph and a probability distriution, and evaluates to either true or false.

In practice, the rules in R tend to encode conditional independence. The important point, though, is that R constrains U. The set of probability distributions that conform to the rules of some graphic model is smaller than the set of all possible distributions. Let F represent a family of distributions, a subset of all possible distributions U.

More importantly, the rules of a graphical model are carefully designed to yield a tractable family of the solution space. Thus, solution space looks like:

To paraphrase my professor:

Sometimes it’s better to find an exact solution to an approximate problem [exact inference on F] than to find an approximate solution to an exact problem [approximate inference on U].

[Sequence] Graphical Models

October 1, 2014March 28, 2017kevinbinz Leave a comment

Hi! So, I’m trying out something new. It’s no longer summer, which means I have much less time to follow my own research paths (because class). But this quarter, I want to “live blog” my insights as I go along. The class is:

Class Site: http://j.ee.washington.edu/~bilmes/classes/ee512a_fall_2014/
Video Lectures: https://www.youtube.com/channel/UCvPnLF7oUh4p-m575fZcUxg

Lecture Posts

Exact Inference On Approximate Problem:

Chordal Graphs [Graphic]. Trees aren’t the only thing supporting perfect elimination orderings (PEOs). An infographic describing the chain of mathematical proofs which ultimately establish chordal graphs as supporting PEOs via the concept of minimal separators.
Family vs. Universe. Graphical models conjoin graphs with a set of rules, which sample & thereby constrain the entire universe U (solution-space). The constrained solution-space is called a family F.
Designer Families. A concrete walkthrough of family design, including which heuristics are important to bear in mind.
Murderous Cliques. How cliques give us insight into making good variable-elimination-ordering choices.
Murderous Messages. Graphs with no cycles (trees) allow us to re-interpret variable elimination epistemically, as a form of message passing.
Tree, Generalized. Introduces k-trees, and discusses the relationship between trees, k-trees, and chordal graphs.

Approximate Inference on Exact Problems:

Exponential Families [Graphic]. From conjugate duality to variational inference.

Final Project Posts

An Introduction To Hyperbolic Discounting

September 14, 2014August 15, 2015kevinbinz Leave a comment

Part Of: [Breakdown of Will] sequence

Table Of Contents

What Is Akrasia?
Utility Curves, In 200 Words Or Less!
Choosing Marshmallows
Devil In The (Hyperbolic) Details
The Self As A Population
Takeaways

What Is Akrasia?

Do you agree or disagree with the following?

In a prosperous society, most misery is self-inflicted. We smoke, eat and drink to excess, and become addicted to drugs, gambling, credit card abuse, destructive emotional relationships, and simple procrastination, usually while attempting not to do so.

It would seem that behavior contradicting one’s own desires is, at least, a frustratingly common human experience. Aristotle called this kind of experience akrasia. Here’s the apostle Paul’s description:

I do not understand what I do. For what I want to do I do not do, but what I hate I do. (Romans 7:15)

The phenomenon of akrasia, and the entire subject of willpower generally, is controversial (a biasing attractor). Nevertheless, both its description and underlying mechanisms are empirically tractable. Let us now proceed to help Paul understand, from a cognitive perspective, the contradictions emerging from his brain.

We begin our journey with the economic concept of utility.

Utility Curves, In 200 Words Or Less!

Let utility here represent the strength with which a person desires a thing. This value may change over time. A utility curve, then, simply charts the relationship between utility and time. For example:

Let’s zoom in on this toy example, and name three temporal locations:

Let t_beginning represent the time I inform you about a future reward.
Let t_reward represent the time you receive the reward.
Let t_middle represent some intermediate time, between the above.

Consider the case when NOW = t_beginning.At that time, we see that the choice is valued at 5 “utils”.

Consider what happens as the knife edge of the present (the red line) advances. At NOW = t_middle, the utility of the choice (the strength of our preference for it) doubles:

Increasing utility curves also go by the name discounted utility, which stems from a different view of the x-axis (at the decision point looking towards the past, or setting x to be in units of time delay). Discounted utility reflect something of human psychology: given a fixed reward, other things equal, receiving it more quickly is more valuable.

This concludes our extremely complicated foray into economic theory. 😛 As you’ll see, utility curves present a nice canvas on which we can paint human decision-making.

Choosing Marshmallows

Everyday instances of akrasia tend to be rather involved. Consider the decision to maintain destructive emotional relationships: the underlying causal graph is rather difficult to parse.

Let’s simplify. Ever heard of the Stanford Marshmallow Experiment?

In these studies, a child was offered a choice between one small reward (sometimes a marshmallow) provided immediately or two small rewards if he or she waited until the tester returned (after an absence of approximately 15 minutes). In follow-up studies, the researchers found that children who were able to wait longer for the preferred rewards tended to have better life outcomes, as measured by SAT scores, educational attainment, body mass index (BMI) and other life measures.

Naming the alternatives:

SS reward: Call the immediate, one-marshmallow option the SS (smaller-sooner) reward.
LL reward: Call the delayed, two-marshmallow option the LL (larger-later) reward.

Marshmallows are simply a playful vehicle to transport concepts. Why are we tempted to reach for SS despite knowing our long-term interests lie with LL?

Here’s one representation of the above experiment (LL is the orange curve, SS is green):

Our definition of utility was very simple: a measure of preference strength. This article’s model of choice will be equally straightforward: humans always select the choice with higher utility.

The option will people select? Always the orange curve. No matter how far the knife edge of the present advances, the utility of LL always exceeds that of SS:

Shockingly, economists like to model utility curves like these with mathematical formulas, rather than Google Drawings. These utility relationships can be produced with exponential functions; let us call them exponential discount curves.

Devil In The (Hyperbolic) Details

But the above utility curves are not the only one that could be implemented in the brain. Even if we held Utility(t_beginning) and Utility(t_reward) constant, the rate at which Utility(NOW) increases may vary. Consider what happens when most of the utility obtains close to reward-time (when the utility curves form a “hockey stick”):

Let us quickly ground this alternative in a mathematical formalism. A function that fits our “hockey stick” criteria is the hyperbolic function; so we will name the above a hyperbolic discount curve.

Notice that the above “overlap” is highly significant – it indicates different choices at different times:

This is the birthplace of akrasia – the cradle of “sin nature” – where SS (smaller-sooner) rewards temporarily outweigh LL (larger-later) rewards.

The Self As A Population

Consider the story of Odysseus and the sirens:

Odysseus was curious as to what the Sirens sang to him, and so, on the advice of Circe, he had all of his sailors plug their ears with beeswax and tie him to the mast. He ordered his men to leave him tied tightly to the mast, no matter how much he would beg. When he heard their beautiful song, he ordered the sailors to untie him but they bound him tighter.

With this powerful illustration of akrasia, we are tempted to view Odysseus as two separate people. Pre-siren Odysseus is intent on sailing past the sirens, but post-siren Odysseus is desperate to approach them. We even see pre-siren Odysseus restricting the freedoms of post-siren Odysseus…

How can identity be divided against itself? This becomes possible if we are, in part, the sum of our preferences. I am me because my utility for composing this article exceeds my utility attached to watching a football game.

Hyperbolic discounting provides a tool to quantify this concept of competing selves. Consider again the above image. The person you are between t₁ and t₂ makes choices differently than the You of all other times.

Another example, using this language of warfare between successive selves:

Looking at a day a month from now, I’d sooner feel awake and alive in the morning than stay up all night reading Wikipedia. But when that evening comes, it’s likely my preferences will reverse; the distance to the morning will be relatively greater, and so my happiness then will be discounted more strongly compared to my present enjoyment, and another groggy morning will await me. To my horror, my future self has different interests to my present self. Consider, too, the alcoholic who moves to a town in which alcohol is not sold, anticipating a change in desires and deliberately constraining their own future self.

Takeaways

Behavior contradicting your desires (akrasia) can be explained by appealing to the rate at which preferences diminish over time (utility discount curve).
A useful way of reasoning about hyperbolic discount curves is warfare between successive “yous”.

Next Up: [Willpower As Preference Bundling]

[Sequence] Demystifying Physics

July 13, 2014June 26, 2017kevinbinz Leave a comment

Perceptual Tunneling

Introductions

Natural History

Miscellaneous Topics

Periodic Table: Orbitals. Explores one facet of quantum chemistry.
Fermions: Meat In Our Particle Soup. Fermions underlie most forms of matter (e.g., the electron). Someday I will blog about force-carrying bosons (e.g., gluons), and hadrons.

Relevance To Perception

Organisms like you are immersed in a sea of particles – particle soup, if you will. Your ability to know – your epistemology – must somehow reside in connections between your mental representations, and the soup.