Links (June 2023)

Part Of: Links sequence

AI

  • In transformer architectures, attention heads are used to tell the model “where to look”. Induction heads (a specific kind of attention head) reliably form at late stages of training, and several lines of evidence suggest they may be the substrate of in-context learning (metalearning).  
  • More from the mechanistic interpretability (MI) agenda. One of the challenges to MI research has been superposition: where a single neuron responds to multiple features. Neurons can represent data points in superpositions (memorization) instead of features  (generalization). In toy model of data double descent, where models transition from overfitting to generalization, neurons transition away from representing data. 
  • Othello-GPT has a linear emergent world model. Despite being trained on move sequences, different neurons encode different positions of board state. If you change the color of the G5 neuron, the model suddenly starts predicting H6, for example. 

Physics

  • The Ads/CFT correspondence draws from the holographic principle to postulate mathematical duality between QM and general relativistic formalisms. Susskind advocates ER=EPR: that quantum superposition is a kind of Einstein-Rosen bridge (wormhole). More generally, that spatial proximity itself is created by webs of entanglement. Physicists have begun empirically testing this in the lab using qubits to examine the properties of quantum teleportation. Recently one experiment purported to demonstrate ER=EPR, But while teleportation did in fact occur, the signature of gravitational teleportation (side winding) was less diagnostic than originally hoped. 
  • Strong evidence for a gravitational wave background (GWB) has been discovered with 15 years of data from pulsar timing arrays. Two potential sources for these waves: traditional explanations invoke collision of supermassive black holes (SMBH), and also cosmological explanations which can invoke new physics such as superstrings. SMBH collisions almost certainly contribute to GWB, but the GWB data is a bit hard to explain using only the standard model. Modeling results suggest many beyond standard model (BSM) theories may have the capacity to explain more. Stay tuned!. 

Leave a comment