Quelklef

milk consumer

girly but not a girl

name-color: #C09


Sometimes, if you want to perform a search, it helps to write the slowest searching algorithm possible


Say you have some property you'd like to test over all the natural numbers. You could have, say, the property [P(n) = n is less than 5], or [P(n) = either 2 divides n or 2 divides n+1], or perhaps [P(n) = if n is even and greater than 2 then we can write it as the sum of two prime numbers] (Goldbach's conjecture)

Anyway, say we have some property P and we want to know if it's true of all natural numbers (or else has a counter-example). If P is computable, then one way to test this is to just... try it on all natural numbers. Write the following program:

for n from 0 upwards
  test P(n). if P(n) is false, output n and terminate

This program works by performing a naive search for a counter-example to P. If one is found, it halts; if no counter-example exists, then it will run forever.

Although this is the most obvious way to do such a search, it is far from the only way. You could perform a search by first testing the numbers 1 through 1000 in descending order and then continuing as normal, or you could do the search while also performing a near-perfect simulation of grass growing, whatever. Lots of options.

Some counter-example searches will be faster than others (and equivalently some slower). I think it's fair to say that if you're simulating grass growing while searching for a counter-example, your program is going to be pretty slow.

Given some counter-example-searching program C, we can test for how "fast" it is by running it for some number of steps (≈ CPU operations) and looking at how far it was able to get in its search. For example. Let A be the 'simple' counter-example-searching program, and let B be the grass-growing-simulation one. If I run both A and B for a billion steps, then A might have tested natural numbers up through n=100, but B--having to spend so many steps on grass-growing--perhaps only got as far as n=10.

Also note that if after n steps a counter-example-searching program C has tested up through n=100, then we know that P(1), P(2), ... P(100) all hold. This is because C, well, searches for counter-examples: if it had found one before n=100, it never would have made it all the way to n=100.

Using all this, let's give a formal notion to the "speed" of a counter-example-searching program. Calling it Tested, roughly we want Tested(C, n) to be "how far" C has gotten after n steps, or "how many numbers" C has tested in n steps. It is impossible for us to look 'inside' C and directly observer how far it's gotten, but we can use the fact that C searches for counter-examples to indirectly probe into how it's doing. We define

Take a counter-example-searching program C and natural number n
Let Tested(C, n) be such that:
  for every k < Tested(C, n) we know that P(k) is true

For what it's worth, that definition is weird and subtle; no shame if it doesn't make sense without staring at it for a few minutes. (I mean, for me to understand this whole subject took hours of staring)

(It might be worth noting here that the shape of Tested is slightly odd. If a property P has no counter-example, then the value Tested(C, n) will increase without bound as n increases, as C continues in its search. And if P does have a counter-example c, then Tested(C, n) eventually tops-out at n=c. What's weird is that we don't know which case it is, since we don't know if P is always true or not. All we can do by looking at the implementation for C is give an upper bound on Tested(C, n) by thinking about how far C would get assuming no termination.)

Alright. Keep Tested in the back of your mind, but we're going to switch topics for a bit. As it turns out, if you know the length of some program, you can place an upper bound on how many steps it spends before halting (how "slow" it can be). The details of this are interesting, but outside of the scope of this post (if you want to find out more, look up the "Busy Beaver" function). For now it will be enough to know that there is a function BB which abides by the following:

If C is a program and len(C) is its length in characters, then either:
 1) C does not halt; or
 2) C halts in some number of steps <= BB(len(C))

In other words, given a program C, you need only run it for BB(len(C)) steps to find out if it will halt or not. (I say run it for "only" BB(len(C)) steps, but in reality the values that BB takes on are very large.)

Okay! Now for the punchline. Let C be some program searching for counter-examples to a property P over the natural numbers. We know by the above that if C halts (ie, finds a counter-example), then it must do so in at most BB(len(C)) steps. And we also know by previous work that after running C for some number k of steps, C will only have tested P on the first Tested(C, k) natural numbers. Combining these facts, we get the following:

If a counter-example x exists to P,
Then x is less than Tested(C, BB(len(C)))

I'm assuming we're interested in actually finding a counter-example to P. Hence, it behooves us to try and minimize the bound Tested(C, BB(len(C))). There are two ways to do this:

  1. We can make C shorter. If len(C) is smaller, then BB(len(C)) is smaller, so Tested(C, BB(len(C))) is smaller, so the bound on x is tighter

  2. We can make C slower. Yes, slower. If C is slower, then in BB(len(C)) steps it will not have gotten as far in its search. But by the above, if a counter example to P exists then it must lie within the numbers tested by C in BB(len(C)) steps. Hence, a slower implementation makes a tighter bound.

Hence, in this case, it can be helpful to find a maximally in-efficient program. (As well as a maximally small one). Pretty funky!

(Post-script. This post is essentially a write-up from a conversation I had some time back with two co-workers. These are not solely my ideas.)


You must log in to comment.