• she/her

it's me! es. i'm also on bluesky, twitter, and mastodon.


jix
@jix

When learning linear algebra, Gaussian elimination is among the very first things taught. Given some basis vectors b1, …, bk in Rn you do some pre-computation to get a matrix in row-echelon form and then you can quickly answer whether any given vector v in Rn is in the vector space spanned by b1, …, bk and if so express it as a linear combination of the basis vectors, as well as a bunch of related computations.

On the other hand, you can learn quite a bit of group theory without learning how to do the equivalent computations for permutation groups. In fact it seems this isn't even mentioned unless you explicitly look for computational group theory material, even though it only requires group theory basics which are more than covered by an introductory abstract algebra course. I think that's a shame, because it's quite cool!

The equivalent problems are: Given some permutations b1, …, bk in Sn, you do some pre-computation to get a Base and Strong Generating Set (BSGS) and then you can quickly answer whether any given permutation v in Sn is in the group G generated by b1, …, bk and if so express it as a word over these generators, as well as a bunch of related computations.

One application for this is a program that can solve a Rubik's cube of any size as well as any similar face-permutation-puzzle without having to implement any algorithm specific to a certain puzzle! Your generators b1, …, bk describe how each basic move permutes the faces and given a puzzle configuration as permutation v, the resulting word equal to v is a sequence of basic moves that solves it. It's not going to be the shortest way to do though!

The analogy here between linear algebra and group there goes even further: In linear algebra, a basis matrix in row-echelon form corresponds to a chain of subspaces starting at the space spanned by b1, …, bk, strictly decreasing in dimension, ending in the trivial vector space {0} and structured in a way that you can project exactly those vectors of one subspace onto the next smaller one by looking at only a single component. If you do that and end up with the zero vector, you started within your vector space, and otherwise you didn't.

For permutation groups, a BSGS corresponds to a chain of subgroups, called a stabilizer chain, starting with the group G generated by b1, …, bk, strictly decreasing in support (set of moved points) and ending in the trivial group. Given a candidate permutation v for G, by a process called sifting, you can then generate a sequence of permutations, one for each subgroup in the chain, such that you end up with the empty permutation if and only if you started with a permutation v in G. For each step you only need to look at single point and where your current permutation sends that point to figure out how to get to the next subgroup in the chain. The subgroup also happens to be the subgroup stabilizing that point, hence it's called a stabilizer chain.

The set of stabilized points is called a base for the permutation group and corresponds to the pivot columns in the linear algebra case and the set of generators for the chain of subgroups is the strong generating set and corresponds to the rows of a row-echelon matrix.

More rambling on Rubik's cube algorithms

One fascinating fact is that a lot of algorithms used by humans to solve these puzzles are a form of sifting a permutation through a stabilizer chain, although usually using fewer, larger steps. For example, by stabilizing (solving) all faces of one complete layer instead of just one face. This requires handling more cases to get from one subgroup to the next, but has two advantages for humans: 1) You get a shorter sequence of moves as solution 2) because each step itself has a lot of symmetries that are easy for us to understand, overall that's still less to memorize than a lot of simpler steps with less symmetries in each step.

Another difference is that there's not one single chain but usually multiple chains or rather a DAG of subgroups and you select the next smaller subgroup to move to depending on which requires fewer moves or for which the moves are easier to spot.

Computing a BSGS for a group given by a set of generators is slightly more complicated than getting a matrix into row-echelon form, and I won't go into all detail here, but it's not too complicated either. Given the generators for G you can get a set of generators for any point-stabilizer subgroup of G using Schreier's lemma1 which is something you can directly compute using only basic group operations.

The problem is that a naive repeated application of Schreier's lemma to build the whole chain increases the number of generators by a factor of 2 or more every step2. The solution to this is the Schreier-Sims algorithm1 which builds the BSGS incrementally and uses the current partially built one to immediately detect and discard redundant generators before things can blow up.

There are a lot of techniques to make this faster, including a fast randomized Monte-Carlo variant and different ways to turn that into a Las-Vegas algorithm so you are still guaranteed to get a correct solution, but the basic idea and how you use the resulting BSGS stays the same. Computing a BSGS is also the first step for a lot of more advanced algorithms in computational group theory and as such one of the core operations that a computer algebra system dealing with permutation groups (like GAP) has to implement.

If you want to learn enough about this to actually implement things, I can recommend two books that go into sufficient detail and also introduce everything required:

  • D. F. Holt, B. Eick, and E. A. O’Brien, Handbook of computational group theory, 1st ed. Chapman & Hall/CRC, 2005.
  • Á. Seress, Permutation group algorithms, 1st ed. Cambridge University Press, 2003.

The first is aiming for "a level suitable for a beginning postgraduate student" and the second "for advanced graduate courses" and they cover a lot more than this. They both have several chapters I skipped entirely because it would have been just too much work to get through them 😵 and I had no immediate need for what they were about. I do think that covering the basics I only sketched above would be accessible to a much wider audience though, given appropriate material that has more and complete examples and that spends some time explicitly highlighting parallels to already familiar ideas in other areas. Sadly, so far I haven't found any 😔

Should I find the time and motivation for it, I might write more about this myself (and I guess with this post, which got much longer than planned 😅, I already started). Let me know if you find this interesting!

What I already did a while ago, was to prototype several variations of the Schreier-Sims algorithm in python to make sure I understand it correctly:

I added a lot of comments so if you want to see code for this, it's probably not the worst thing to look at, but I also learned about further implementation techniques since, so this is not a complete reference for a production grade implementation.

I do want to implement this in rust with a larger focus on performance, but that might take a while.


  1. The Wikipedia articles on all this are not that great IMHO...

  2. Some might be duplicates of others, but that alone won't safe you from things eventually exploding....


ess
@ess

this is very intelligent and i want to make sure more people see it. i can't make any comments on it myself though, as i am unfortunately extremely dumb at the moment


You must log in to comment.