i'm going to answer these two questions in reverse order. really just going maverick on this ask here.
(2) there are many things that i despise passionately. but few so much as the root-2-to-the-root-2 proof, reproduced begrudgingly below. you'll have to click to expand. i'll pretend like the reason is because the html typesetting i insisted i use is ugly, but really it's because the proof makes me mad.
proposition. there exist irrational numbers x and y such that xy is rational.
proof. √2 is known to be irrational, so consider x = √2√2. if x is rational, we're done, so suppose x is irrational. then
is rational and we are again done. ∎
some of you out there may think this is cute. i'm sorry but i think it's just kind of inane. it's an arithmetical parlour trick. just look at it! the statement of the theorem is so cooked, dude. like was there any reason to doubt this?
no, really, if you're able to define real exponentiation (x, y) ↦ xy then you already know it's continuous and monotonic in y for all x > 0, so it's clearly surjective on the positive reals. so in particular any transcendental base would need to be raised to an irrational exponent to give a rational: if you raised a number to a rational exponent and got a rational, then that number had to be algebraic.
you can even just come up with your own pairs of irrationals that work, like x = √2 and y = 2 log2 3. both of these numbers have easy and elementary proofs of irrationality.
oh and, to calm your nerves, √2√2 is irrational. kuzmin (1930) has a theorem about numbers of the form a√b while gelfond–schneider (1934) is the current last word on irrational powers of algebraic numbers.
(1) i'm not sure i have ever slapped a knee in satisfaction, but i have definitely seen proofs where reading them makes you feel like, now that's how it's done. i have written a few of them myself. i could never pick a favourite, but let me tell you the story of the cayley–hamilton theorem. content warning for linear algebra 1 and 2
waking up
so. linear algebra. square matrices. i'm gonna recite some notation to try to get you in the mood, but this is all quite basic linear algebra so feel free to zone out if you've seen this before. i won't even do anything silly with the field of scalars, we can stick to the good old complex numbers ℂ.
if you have a linear map L : V → V from a vector space V to itself, and you fix a basis B = {e1, ..., en} on that space, then you can extract a square matrix A of scalars which characterizes L. each column Aei is derived from the image L(ei) of a standard basis vector under this linear map. changing the basis to B' = {e'1, ..., e'n} corresponds to a conjugated matrix MAM−1 where M is defined by Mei = e'i.
the determinant of order n is a scalar-valued function on the n×n square matrices. it has a lot of remarkable properties, but for our purposes:
- it is a polynomial expression in the entries of the matrix, which means it is made of Just Plain Old Algebra; and
- it detects invertibility of matrices, in that det A ≠ 0 if and only if A is invertible. equivalently, it is zero iff the linear map sends some nonzero vector to zero.
as a tool, the determinant is a cudgel, blunt and brutal but effective. let's use it now.
the theorem
suppose you were interested in the eigenvalues λ of A. these are scalars such that Av = λv for a nonzero vector v called a λ-eigenvector for A. eigenvalues are invariant under conjugation—indeed, you can verify Mv is a λ-eigenvector for MAM−1—so really they belong to the linear map L and not just the matrix, which is a clue that they deserve study.
if v is a λ-eigenvector for A, then it will be sent to zero by the matrix λI − A. so we know det(λI − A) = 0, without even making reference to v. combining this observation with the fact that det is polynomial in the entries of the matrix, we find that the eigenvalues of A are exactly the roots of the characteristic polynomial of A,
where we have abstracted over the eigenvalue with the variable t. the cayley–hamilton theorem concerns χ and its relationship to A.
cayley–hamilton theorem. χ(A) = O, the zero matrix.
well that sure looks simple! you may be tempted to evaluate it thus:
but that would be unsound! χ(M) ≠ det(MI − A) because the lhs is a polynomial evaluated with a matrix argument (a linear combination of powers of a matrix, which you can check produces a matrix) and the rhs is a determinant (which is a scalar). you have to work around the notation here, which means you have to actually sort out the coefficients of χ, or else reason about it in a different way that is more nuanced than this.
consequently, most treatments put off this result until they can swing a giant hammer at it, such as jordan form. sometimes you will see it tackled a little more directly, like proving it first for diagonalizable matrices over the complex numbers, and then arguing that the diagonalizable matrices are dense and det is continuous. or maybe you will do a horrible, bashy computation of the coefficients.
but there is a better way! all you need is a bit of patience for commutative algebra, and a simple proof based on the intuition "A is a root of tI − A" can be recovered.
a place to work
the first thing you need is to find suitable rings for these computations to take place in. what is a ring? if you don't know, then i don't really have the time to explain why you ought to care, but in short, it's just the mathematically accurate word for scope, in this case. if a vector space is a place where you can add and subtract and rescale by a complex number, then a ring is a place where you can add and subtract and just plain old multiply. the complex numbers form a ring, as do the matrices and as do the polynomials.
let me explore this just a bit longer. the ring of polynomials, with coefficients in R and having the variable t, is called R[t]. you already know how arithmetic works in polynomials: rtm · stn = rstm+n, and so on.
polynomials have one interesting thing which makes them special, which is that for any element r ∈ R, you can evaluate a polynomial p(t) ∈ R[t] at r, usually denoted p(r). evaluation at r is a function εr : R[t] → R. when the multiplication of R is commutative—order doesn't matter, like the complex numbers but unlike the matrices—then evaluation is actually a special kind of function called a homomorphism, which means a lot of things, but the important bit for us is that εr(x · y) = εr(x) · εr(y).
however, matrix multiplication is famously not commutative, so even though we can evaluate χ at A, we don't get a ring homomorphism on all matrix polynomials. ...but hey, we don't need all matrix polynomials!
we just want to plug in A into this simple little polynomial χ with scalar coefficients. so the only matrices we're ever going to see in our calculations are linear combinations of powers of A. this is a ring called the principal subalgebra of A:
this is the least subring of the matrices that contains both ℂ (viewed as scalar multiples of the identity matrix I) and A, so we can do all the relevant arithmetic in here. and luckily the principal subalgebra is commutative, as AkA = AAk, so there is indeed a well defined evaluation-by-A homomorphism ε : ℂ[A][t] → ℂ[A].
to make it more clear what's going on, going forward i'll use the notation χ(t) ∈ ℂ[t] for the characteristic polynomial as originally defined, and the notations χ(It) = χ(t) · I ∈ ℂ[A][t] for its lift into any rings of matrix polynomials. a handy way i like to think of this stuff is that tI = It in these rings: the lhs is an (entrywise) scalar multiple of the identity by the complex indeterminate t, while the rhs is a "matrix indeterminate".
you could call it (and by it i mean It) T if you were so aesthetically inclined. imo, it's only weird to talk about this stuff, because the way we like to talk about matrices and polynomials is highly typed. in this case, while you do have to be careful, it's too easy to psych yourself out, because as we observed, all the algebra works out just fine in ℂ[A][t].
an analytical lens
the second ingredient we'll use is the adjugate matrix. for every matrix M there exists a matrix adj M such that
where the lhs is a matrix product and the rhs is a scalar multiple of the identity matrix.
i haven't given you the definition of the adjugate, so i guess if you haven't heard of it before and/or wanted to accuse me of cheating and hiding the details, this would be a perfect time to do it. but really, it's no more complicated than the definition of the determinant (which i have also skipped for this chost) and in my professional opinion it's criminal not to teach it alongside the determinant.
if you haven't seen it, let me just leave you with a trail of breadcrumbs: if you remember the laplace expansion of the determinant, then you may observe it looks suspiciously like an inner product between two vectors, one of them being a row or column of M... combine that with the diagonal entries of the equation above to get a definition, as well as an exercise which you can solve using the alternativity property of the determinant.
putting them together
so now we have the following analysis:
evaluating the lhs under ε, we get ε(χ(It)) = χ(A). we'd like to evaluate the rhs as well, and use the homomorphic nature of ε to say ε(It − A) = A − A = O and absorb the other factor.
the one thing standing in our way is, we need to know that adj(It − A) actually belongs to ℂ[A][t]—equivalently, that it is a polynomial expression in A and It—or else it would not make sense to apply ε to it! this maybe seems like a weird place to split a hair but this is important for algebra. fortunately, there are a number of elementary ways of doing this.
the most direct is by polynomial division of χ(It) by It − A. specifically, because we already know that det(It − A) · I = (It − A) · adj(It − A), what we learn is that the quotient and remainder of the division (which by uniqueness are equal to the adjugate and zero, respectively) belong to the principal subalgebra.
and that somewhat anticlimactically concludes the proof! we simply hit the adjugate factorization of the characteristic polynomial with the evaluation homomorphism on the principal subalgebra, and the result falls out, like a maple seed from a tree.
i tried to make it look natural, but i think that commutative algebra takes some getting used to so it feels out of place here, for such an elementary theorem. that said, cayley–hamilton is equivalent to the minimal polynomial dividing the characteristic polynomial, which is a very powerful fact for linear algebra, so don't sleep on it! i'd also like to give a shoutout to the principal subalgebra, i think this is a brilliant application of principal objects and a reminder to always pay attention and not generalize any further than you have to.
i'm sure that under a different perspective, this proof of cayley–hamilton is the complicated one and the root-2-to-the-root-2 proof was clever and fun. but i stand by my assessment. the theorem to be proven is an important part of the proof, and "counterexamples in arithmetic" is just not a very compelling theme for a result. sorry, number theorists.
meanwhile, cayley–hamilton is a sensibly motivated question with a tantalizing hook, and this proof pays that off in a viscerally satisfying way, by doing the thing you thought you couldn't do. all this despite-or-perhaps-because-of the fact that it's just a bunch of standard commutative algebra techniques. it's the sophomore's dream of commutative algebra, really
