The Psychology Of Chess

Part Of: Chess sequence
Followup To: The Chess SuperTree
Content Summary: 1400 words, 14 min read

Dual Process Cognition: Working Memory

Chess engines operate by generating a decision tree, pruning irrelevant branches, assigning scores to the leaves, and consulting the minimax oracle for an optimal decision.

But when we switch from silicon to grey matter, we notice that the human mind is structured differently. We can be meaningfully divided into two selves: a fast, autonomic, associative mind and a slow, serial, conscious mind. What’s worse, the content of our “more rational” self is small – System II is grounded in working memory, which can hold 7 +/- 2 chunks of information.

Working memory span is highly correlated with IQ. Keith Stanovich, in his Rationality and the Reflective Mind, claims that intelligence can be understood as a cognitive decoupling; that is, highly intelligent people are able to evaluate more hypothetical scenarios faster, and keep those explorations firewalled from one’s modeling of the current reality.

While playing chess, the ability to construct a decision tree is a conscious, effortful task. In other words, decision trees are instantiated within your System II working memory during game play.

Dual Process Cognition: Somatic Markers

If decision trees are consciously maintained, what is the basis of score evaluation functions? Recall that machine learning accomplishes this by virtue of training feature vectors and weight vectors. Suppose these vectors can contain two millions numbers each. Every time the engine is asked to evaluate a position, it will compute a dot product: scan over the vectors, and for each value record its product.

With decision trees, I can remember keeping various alternatives in my head. Not so with producing decision scores. Nothing in my gameplay experience feels like I am computing two million numbers with every move. This suggests that score evaluation functions happen within the subconscious, massively parallel System I.

But what does evaluating a position feel like? Two phenomenological observations are in order:

  1. Playing chess is a highly intuitive, emotional affair. Positions in which we have a strong advantage feel good, moves that lead us into checkmate feel scary, and the objectively optimal move tends to produce the most positive feeling. In other words, emotion is the mediator of chess skill. This is an expression of the somatic marker hypothesis.
  2. Explaining chess moves tends to naturally flow in a non-mathematical language, where reasons are considered the final authority over decision making. I am better able to remember “White is better because he has an attack” instead of “White is 20% more likely to win the game”.

We’ll return to these suggestive observations in a little bit.

Is Scoring Cardinal Or Ordinal?

In economic theory, all decisions are mediated by a psychological entity called utility. (I introduce this concept in this post). Neuroeconomics is an attempted integration between economics, psychology, and neuroscience; it tries to get at how the brain produces decisions. Within this field, there is a live debate whether the brain represents utility cardinally (e.g., 3.2982) or ordinally (e.g., A > B and B > C).

Consider again how chess engines compute on the decision tree. The minimax algorithm pushes leaf nodes up the tree, merging results via the min() or max() operation. Whereas engines perform these operation against cardinal score nodes (e.g., in fractions of a pawn called centipawns), this need not be the case. Taking the max(A, B) does not require knowing the precise value of A and B. It is sufficient to know that A > B. In other words, minimax can also work against an ordinal scoring scheme.

The implications of an ordinal scoring scheme becomes more apparent if you view the minimax algorithm from a comparison perspective. In the following image, white spheres represent White decisions (maximization) and gray spheres represent Black decisions (minimization).

Chess Psychology- Ordinal Representation

It is possible to think of image on the left as a “side view” of the minimax algorithm. With this geometric interpretation, the right image becomes a “bottom view” of the exact same algorithm. Pause a moment here until you convince yourself of this.

Consider the nested spheres containing the purple nodes. These represent three leftmost threads in the decision tree: { { 1a vs 1b } vs 1c }. Do we need to know the value of { 1a vs 1c } or { 1b vs 1c }? If we knew both facts, we could surely have enough information to implement minimax. But this “easy solution” would require a total ordering, which is prohibitively expensive. We instead need to resolve the { 1a vs 1b } fact, after which we can compare the result to 1c.

So the brain could choose to evaluate positions ordinally or cardinally. Which technology does it employ in practice?

I don’t know, but I can recommend a test to divine the correct answer. The order in which cardinal scores are generated is unconstrained: a chess player with a cardinal function can evaluate any position at any time. In contrast, the order in which ordinal scores are generated is highly constrained (if we assume a non-total ordering). A brain that implement ordinal evaluations cannot physically evaluate the 1c thread until it has evaluated { 1a vs 1b }. Thus, if we were to study the speech and subjective experiences of a multitude of chess players, we could hope to learn whether their creative abilities are constrained in this way.

The State Comparator & Distinguishing Features

Consider again the feature-weight dot product model of score functions. If I were to evaluate two very similar functions, the dot product would necessarily be very similar: only a few dozen features may meaningfully diverge in their contribution to the final output. Call these uniquely divergent features the distinguishing features.

A computer engine will compute similar dot products in their entirety (with the exception of position that are exactly the same, see transposition tables); it strikes me as very likely that the brain (a metabolic conservative) typically only retains a representation of the distinguishing features. In other words, the “reason-based natural language” that humans produce when we evaluating gamelines draws from a cognitive mechanism that recognizes that features close in distance to the KingSafety category tend to be the relevant ones.

In the discussion of Decision Trees In Chess, I discuss four different moves. The insightful reader may have noticed that I moved my Bishop to the h6 square in three separate situations. These considerations happen at three locations in the decision tree:

Chess Psychology- State Comparator

While a chess engine may score each position repetitively, each score computation will be very similar. Only two features really stand out:

  1. The purple node is different from the orange node because the former has traded off two pawns.
  2. The green node is different from the orange node because the former has made two Knight moves to attack & defend the Kingside.

If you consult the minimax graphic above, you’ll see that my preference ordering is { Green > Orange > Purple }. I would explain this in language by appealing to the relevant differences (e.g., “the pawn trade frees his pawns to control the center”). To me, this is evidence for a state comparison tool that is able to zoom in on distinguishing features.

The Move Generator & Representative Lines

Another distinction between human and computer decisions in chess seems to lie in how the decision tree is approached. Engines like Stockfish typically have a getLegalMoves() function that fleshes out the entire branching structure, and then (extremely) sophisticated pruning techniques are used to make truly deep evaluations possible of the interesting lines. It is possible that nonconscious Type I processes in humans do something comparable, but this strikes me as unlikely. Rather than an eliminative approach to the SuperTree, it seems to me that humans construct representations gradually.

Specifically, I suspect that humans have a Type I statistical mechanism that generates a few plausible moves for a given position. Consider the following subset of a decision tree I examined previously:

Chess Psychology- Representative Lines

Each orange rectangle represents moments when I simply stopped considering alternatives; these representative lines are depth-oriented in nature. I suspect the ubiquity of these representative lines can be explained by appealing to serial application of the MoveGenerator, and the sheer difficulty humans have in maintaining a complex tree in working memory.

Takeaways

  • The brain represents decision trees largely in working memory (a conscious System II process)
  • Massively parallel System I processes are how the brain computes position scores, which are encoded in emotion.
  • While computers represent scores as numbers, it is possible the brain represents scores as preferences (an ordinal system).
  • The brain, being allergic to duplicate computations, contains a State Comparator which isolates distinguishing features within similar positions.
  • Rather than pruning a complete decision tree like computers, the brain an intuitive, System I, Move Generator to construct a plausible sub-tree.

The Chess SuperTree

Part Of: Chess sequence
Followup To: Decision Trees In Chess
Content Summary: 1500 words, 15 min read

Chess Is A Nested Algorithm

In An Introduction To Chess, I introduced you to the basics of chess analysis: how to tell when one position is better than another. Specifically, we looked at material, pawn structure, space, king safety, piece development, piece cohesion, and initiative. These ways of thinking about different chess positions were combined into a score. The first aspect of chess thought lies in the maturation of such an scoring function.

In Decision Trees In Chess, I introduced you to the basics of chess decision making: how to evaluate different alternatives when you are not sure what to do. Specifically, we looked at how imagining different scenarios can be represented in decision trees, how these trees ultimately use scoring functions in their leaf nodes. These decision trees produce decisions after application of the minimax algorithm.

Together, scoring functions and decision trees describe the entirety of what it is to do chess. Had this series been written by world champion Magnus Carlsen, I submit that even he could communicate effectively using this two-part framework. The only things that need be different would be his analysis (his scoring function would be more accurate) and his choice of alternatives to analyze (his considered choices would contain fewer bad alternatives). In other words, skill in chess is not doing more than score functions and decision trees. It is doing them well.

Two Kinds Of Score

Consider the following. White’s turn, can you find mate in two moves?

Chess Computation- Example Mate In Two (1)

The solution to this puzzle, displayed on the rightmost decision tree, is Qxb8+. If Black recaptures, White checkmates with Bb5# (note that the Bg5 prevents Black’s king from escaping (and if Black runs away Kd7, he is met with the same fate).

Now, imagine if White had not found this clever solution immediately. Suppose White instead tries to attack Black with Bb5+. White’s reflections could lead him to consider the other decision tree, which ultimately values Bb5 as -3.2 points (do you see why the minimax algorithm requires this to be the case?)

Consider the returned values (1-0 WIN) for Qxb8+, and (-3.2) for Bb5. This incongruity is itchy. Why would the scoring function return different types of answers depending on the position?

Meet The SuperTree

Imagine for a moment the decision tree for all chess games. Call this object the SuperTree. What does it look like?

Consider the following facts:

  1. On move zero, all chess games ever begin in the same position.
  2. After white’s first move, all chess games ever are in one of twenty possible positions (White has 20 legal moves)
  3. After Black’s first move, all chess games ever are in one of four hundred possible positions (20*20)
  4. All chess games ever conclude with a result (White wins, Black wins, or tie). Some games finish quickly, others take many moves.

These facts can be consolidated into the following visualization:

Chess Computation- Birds Eye View (1)

Consider the top of the SuperTree: as we move our gaze downwards, its width increases. This corresponds to our observation above that, as more moves are made, the number of possible positions rapidly increases.

Why then does the number of possible positions decrease as we move into the endgame? Well, you have surely noticed how endgames tend to feature fewer pieces. Consider all chess games that have transitioned into King+Queen vs. King. There are only a few thousand ways to place these three pieces on the board (64*63*62 = 249,984 is an upper bound). Fewer pieces means fewer states (which is why the Supertree is shown as a diamond, not of a triangle).

At the end of every game, at the end of a game tree, are result nodes. This are represented by the dark orange boundary.

We have so far analyzed two game positions: my game from the Decision Trees In Chess post, and the Mate-In-Two position above. Since the above cartoon claims to comprise all chess games that can ever be played, let’s see how these games fit into the SuperTree.

Chess Computation- Games Within BEV

On the left, we see that the Mate-In-Two game line from this post is shorter: this reflects that it (presumably) terminated in the early middlegame, whereas my opponent in the last post survived until the 53rd move.

On the right, we see the decision trees we constructed at moments in each of these games. Pay attention to the coloring of these trees: in the earlier analysis, our endpoints we numerical scores (green) which propagate up to the top of the triangle. In today’s analysis, our endpoints contain a mixture of numerical scores (green) and final results (orange). In this case, by minimax logic, the orange style of result successfully propagated to the top of the decision tree.

Can Chess Be Solved?

Final results, like numerical scores, propagate up a decision tree. If we start at the end of chess games, and work our way backwords, could we color the entire cartoon orange? Could we banish these numerical scores and solve chess? By “solve chess”, I mean be able to prove perfect play; that is, to be able to prove, from any position, the correct move (including move 1).

Lest I be accused of dreaming of something too implausible, precisely this has been done for games like Checkers and Connect-4 (more examples here).

It turns out that chess has been partially solved: computer scientists have managed to solve all games that land in King+Queen vs. King positions. Don’t believe me? Go here, and experience for yourself what it is like to play against God.

We can visualize on our cartoon how computer scientists have managed to “pull orange upwards” in an attempt to solve the SuperTree. These results are typically referred to as tablebases.

Chess Computation- BEV with Endgame Tablebase

Why haven’t tablebases reached the top of the SuperTree? Compare the size of tablebases as we increase the number of pieces. A three-piece tablebase would include, for example, our classic example of King+Queen vs. King. Such a tablebase only requires 62 kilobytes of memory. But consider what happens when we accomodate more pieces:

Number Of Pieces Tablebase Size (Bytes)
3 62,000 62 kB
4 30,000,000 30 MB
5 7,100,000,000 7.1 GB
6 1,200,000,000,000 1.2 TB
7 140,000,000,000,000 140 TB

To my knowledge, to date our modern supercomputers have failed to construct a complete 8-piece tablebase. Chess begins with 32 pieces. From the table above, when do you think the human race will possess enough computing resources to implement a 32-piece tablebase? 🙂

A Role For Numeric Scores

We now see why our scoring function cannot return game results for most moves in our path.

Consider the case where the results of the Supertree propagate upwards in a completely unpredictable manner. In this situation, we could do no better than rolling a dice to make our move, because no move can increase or chances of winning (or losing, or drawing).

But chess is not like this. Suppose you convince two evenly-matched players to play 100 games, and you remove the queen from the first player’s starting position before each match begins. The second player will probably win many more than 50 games. This tells us that chess results are not completely unpredictable. Locations in the SuperTree in which one player has a material advantage have an above-average chance of leading to a Win result, just as material disadvantages tend to balance the scale towards the Lose result. Given this window of improved predictability, we can prefer moves that improve our material situation.

Most moves in chess, regrettably, do not yield a material advantage. Can we do better than rolling the dice? Do there exist materially-equal positions where, when given to evenly-matched players, one player will win more frequently? Of course there are. We met some of these measures in An Introduction To Chess, for example:

  • Pawn structure. Some pawn configurations are more stable and defensible than others.
  • Space. Some pawn configurations allow a player relatively more room to maneuver.
  • King safety. How many defensive resources does a player have near his King?
  • Piece development. At what rate has one managed to bring pieces into the fight?
  • Piece cohesion. Are one’s pieces working together well?
  • Initiative. Who is making more threats, and dictating the course of the game?

Skill in chess is nothing more, and nothing less, than this: the ability to discover new features (patterns in the Supertree) and the ability to “mix & match” all features to produce a single move recommendation.

This is why our “green” numeric results are are so vital. Despite not knowing the true state of the underlying SuperTree, iteratively maturing your score function provides a player with an increasingly confident prediction of the SuperTree’s true value. In other words, these aggregations of different features comprise a heuristic flashlight that gives clues about the SuperTree.

Takeaways

  • Chess decisions are made via a nested algorithm.
    • At the higher level, a player must compute a decision tree, and make decisions by rolling up the leaf scores via the minimax procedure.
    • At the lower level, for each node in the decision tree, a player must invoke a scoring function to evaluate the desirability of that position.
  • Some decision trees can be evaluated as Win/Lose/Draw, rather than a particular number.
  • The SuperTree – the decision tree that contains all possible moves of any possible game – is a useful way to think about the game of chess.
  • We can think of specific games as traces in the supertree, and understand why attempts to solve the SuperTree (tablebases) have proved intractable.
  • Scoring functions only work if they provide access to regularities within the underlying SuperTree.