Part Of: [Advanced Inference In Graphical Models] sequence
Table Of Contents
- Motivations
- Nested First-Order Semirings
- Recursive Operator Resolution
- Future Directions
- Conclusion
Motivations
In 2002, Jason Eisner published Parameter Estimation for Probabilistic Finite-State Transducers in which he announced the discovery of the expectation semiring. This algebraic template is effectively used in the context of inferring parameters in a finite-state transducer (FST) model in the light of training data. The expectation semiring has the following structure:
Why are expectation semirings cool?
The weights from the expectation semiring are used to compute first-order statistics (e.g., the expected hypothesis length, or feature counts).
In 2009, Zhifei Li and Jason Eisner jointsly published First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests. This paper extended the previous paper by moving from a FST model to a hypergraph model, but also presented the variance semiring
Why are variance semirings cool?
The variance semiring computes second-order statistics (e.g., the variance of the hypothesis length or the gradient of entropy). The variance semiring is thus essential for many interesting training paradigms such as minimum risk, deterministic annealing, active learning, and semi-supervised learning.
Our question today:
Is there a connection between these semirings?
Nested First-Order Semirings
To create a nested first-order ring, simply duplicate the original, and concatenate the two semirings together:
Notice the remapping section, where we map each nested pair to a single argument to the semiring operators. This is also captured in my use of color.
Recursive Operator Resolution
Once you have these re-mapped the nested semiring, you must simply apply the original operators repeatedly. It is worth noting that the original semiring interpreted ⊗ in terms of multiplication, so we must recast multiplication back to ⊗ as many times as we recurse into the operator chain.
As you can see, after our algebra machine finishes, the second-order semiring is synonymous with our previously-discovered result of the variance semiring.
Future Directions
So we possess a mechanism by which we can transform the expectation semiring into the variance semiring. Here’s a “before & after”:
So what? Is this all just a post-hoc rationalization, an attempt to explain why the variance semiring is useful after the fact?
I believe not. Eisner is not the only theoretician using semirings, but he is the only one (to my knowledge) to propose second-order semirings.
How might other researchers go about discovering new second-order semirings? Well, one immediate first step would be to apply the lifting technique to known-viable semirings, and see what happens! 🙂
Conclusion
Let me conclude with my list of open questions:
- Do any viable semirings known today in fact support the lifting trick?
- Quarternions were derived from complex numbers, from (a+bi) to [(a+bi)+(c+di)j]. What explains this analogue?
- Could third-order semirings be productive?