aboutwritingtalksworkeducation

Formally Verifying PBS Kids with Lean4

Shadaj Laddad · February 2, 2026

Cyberchase is an educational TV show aimed at children ages 8 - 12 that has been airing on PBS Kids since 2002. As someone who grew up without cable, PBS Kids was the go-to channel after school. From elementary through middle school, Cyberchase became a weekly staple that I enjoyed personally and was approved by my parents for its mathematical content.

The show features three kids (Matt, Jackie, and Inez, known as the Cybersquad), who are recruited by a friendly AI (Motherboard) to protect the world (Cyberspace) from an evil villain (Hacker, voiced by Christopher Lloyd of Back to the Future fame). Hacker’s mayhem leads to some creative puzzles, which the Cybersquad solve by applying their math skills.

Cyberchase has a strong focus on discovering math concepts from first principles. Rather than being taught facts axiomatically, the Cybersquad must first derive a general principle before they can apply it to defeat Hacker. At least for me, this was extremely effective both in getting me interested in math and developing fundamental math skills. I attribute a lot of my success studying computer science to watching Cyberchase as a kid.

Software engineering faces a very similar set of problems. Modern systems are composed of several complex pieces of software, with axiomatic assumptions about the behavior of these building blocks. These assumptions range from trivial, such as handling integer overflow, to incredibly complex: transaction isolation in a distributed database. Incorrect assumptions about the provided behavior can lead to catastrophic consequences.

Interactive Theorem Proving aims to solve this problem. These programming languages allow us to write down proofs of math and code as a composition of mathematical facts, which themselves come with proofs—all the way down to fundamental axioms of mathematics. Such formal proofs can be easily checked by a computer (any mistakes show up as type errors) and the checker itself is very unlikely to have bugs because the proof captures all details down to the lowest level111. This is in contrast to SMT solvers, which are more autonomous but involve complex and sometimes-buggy heuristics. There is ongoing work for SMT solvers to emit their results in a formal proof language for verification. This is in contrast to SMT solvers, which are more autonomous but involve complex and sometimes-buggy heuristics. There is ongoing work for SMT solvers to emit their results in a formal proof language for verification. .

In this blog post, we’ll take a look at Lean, a popular theorem proving language used by mathematicians such as Terry Tao222. And created by my colleagues in the Automated Reasoning Group at AWS. And created by my colleagues in the Automated Reasoning Group at AWS. . We’ll use Lean to formally model one of my favorite episodes of Cyberchase: "Problem Solving in Shangri-La"333. Evidently, a favorite episode for many mathematicians. I’ve found comments on r/math discussing it Evidently, a favorite episode for many mathematicians. I’ve found comments on r/math discussing it . As we develop the proof, we’ll explore how the Lean language works, techniques for writing proofs in Lean, and why Lean is so powerful.

The Problem Setup

If you have 20 minutes to spare (or less if you watch at 2x speed), I highly recommend that you watch the original episode of Cyberchase. It’s a lot of fun, and will set up the intuition we’ll need for the formal proof.

Let’s get to the core puzzle at hand. The Cybersquad are competing against Hacker in a turn-based strategy game444. In fact, it is a variant of Nim, a well-studied combinatorial game. In fact, it is a variant of Nim, a well-studied combinatorial game. . Between the two teams, there is a pool of 15 dragons: 14 green and one red. Each team must take turns removing 1, 2, or 3 dragons from this shared pool. If a team takes the red dragon, they lose!

We’ll begin our formal proof by defining the behavior of the game. In Lean, we can define functions just like a regular language, but the types and syntax will be a bit more aligned with math. For example, we will use Nat to take natural numbers (n0n \geq 0) as input, and the single equals sign (=) refers to equality rather than mutable assignment (we use := when defining variables and declaring the body of a function).

Some math notation that will be useful throughout this blog post:

  • Nat / N\mathbb{N}: the natural numbers, all integers n0n \geq 0
  • ¬x\lnot x, boolean negation (¬False=True\lnot \mathit{False} = \mathit{True})
  • \lor, boolean OR (equivalent to || in most languages)
  • \land, boolean AND (equivalent to && in most languages)

First, let us define a strategy that the Cybersquad will follow during the game. To simplify our proof later, the input to the strategy will be the number of green dragons left (that way 0 is still a valid state). In their first game against Hacker, the squad doesn’t quite have a strategy; let’s write down a similarly naive strategy of always taking one dragon.

def squadStrategy (green_dragons: Nat): Nat :=
  1

Next, we need to write a simulator that determines if the squad can win with their strategy when playing against Hacker, starting with a given number of green dragons. There is an immediate issue: Hacker is our enemy and we do not know what his strategy will be. If we want the squad’s strategy to be foolproof, we have to assume the worst case for Hacker: that he will always make the best move.

We can model this with a non-deterministic policy for Hacker. We will simulate all possible decisions that Hacker may take at each step, and check if any of those decisions (taking one, two, or three dragons) lead to a victory. This branching structure captures all possible strategies for Hacker, since any sequence of decisions is one path down the tree.

We can implement the simulator as a pair of functions: one that simulates a step by the squad and one that simulates a step by Hacker. Each function will return whether the squad or Hacker (respectively) would win the game after making that move (given the current game state—the number of green dragons—as a parameter).

This approach results in a natural recursive definition: the squad wins if after making their move Hacker does not win from the resulting state, and vice-versa (with several for Hacker to capture the non-deterministic strategy). Because these functions are mutually recursive (each invokes the other), Lean asks us to wrap the pair in a mutual block.

mutual
def squadWins (green_dragons: Nat): Bool :=
  if green_dragons = 0 then
    False
  else
    ¬ hackerWins (green_dragons - (squadStrategy green_dragons))

def hackerWins (green_dragons: Nat): Bool :=
  if green_dragons = 0 then
    False
  else
    ¬(squadWins (green_dragons - 3)) ∨ ¬(squadWins (green_dragons - 2)) ∨ ¬(squadWins (green_dragons - 1))
end

If we add this code, Lean will raise a error! It is concerned that the mutual recursion between these two functions may never terminate, which is not allowed in Lean (since then they would not be well-formed mathematical functions).

fail to show termination for
  squadWins
  hackerWins

...

  failed to eliminate recursive application
    hackerWins (green_dragons - squadStrategy green_dragons)

...

green_dragons : ℕ
h✝ : ¬green_dragons = 0
⊢ green_dragons - squadStrategy green_dragons < green_dragons

Lean helpfully provides a proof objective that is sufficient to show termination. If we can show that the total number of green dragons reduces after the squad makes their move, we know that eventually the game will terminate since number of dragons will go to zero555. The eagle-eyed will notice this objective is actually unnecessarily strong. Even if the squadStrategy does not decrease the dragons, hackerWins always does so the function will terminate. But proving that requires a bit more trickery with setting up a measure. The eagle-eyed will notice this objective is actually unnecessarily strong. Even if the squadStrategy does not decrease the dragons, hackerWins always does so the function will terminate. But proving that requires a bit more trickery with setting up a measure. . In Lean, all proof objectives are displayed with the same notation:

var1: type1
var2: type2
...
⊢ logical expression that we want to prove

Variables in a proof can hold a value (such as green_dragons above, which is a value in N\mathbb{N}). But variables in Lean can also represent logical facts we know. For such a proof variable, its type is the fact itself. In this case, the variables include h✝ : ¬green_dragons = 0, which tells us that we are given that ¬green_dragons = 0 is true (because the recursive call only occurs in the else branch of squadWins).

This is a great time to write our first proof in Lean! Before we dive into the details, it is helpful to set up the proof outline and check that Lean is satisfied. To locally prove a fact, we use the have keyword, which defines a local variable of some type (which, as a reminder, will be the fact itself). The right-hand sign of the declaration will contain the proof of this fact, using the by keyword to enter proof-building mode.

mutual
def squadWins (green_dragons: Nat): Bool :=
  if green_dragons = 0 then
    False
  else
    -- proof of termination:
    have decreasing: (squadStrategy green_dragons) > 0 := by
      sorry

    -- recursive call:
    ¬ hackerWins (green_dragons - (squadStrategy green_dragons))

-- ...
end

The sorry keyword in Lean gives you a "proof by laziness" — it lets you prove anything and is helpful to scaffold a proof without filling in all the details. You’ll notice that I’ve actually adjusted the proof goal a bit to make it simpler. Instead of showing that recursive call has a smaller parameter, we will prove that the squad always removes at least one dragon (which implies the former). Lean is smart enough to know that:

green_dragons0x>0    green_dragonsx<green_dragons\mathit{green\_dragons} \neq 0 \land x > 0 \implies \mathit{green\_dragons} - x < \mathit{green\_dragons}

Now that we’ve "proven" that the game terminates, our program will typecheck. Of course, any proof that involves sorry is a bogus proof, since there are missing steps! Let’s fill in that proof now.

In Lean, proofs are written by repeatedly applying tactics, which transform the proof objective and inputs to the proof (like h✝). It is helpful to think of writing Lean proofs as a game: we want to keep applying tactics so that eventually the proof objective is a trivially true statement (such as True). We can do this by expanding the proof objective, using the facts we are already given, and applying theorems.

The simplest tactic in Lean is the rw (rewrite) tactic, which takes in equivalences of the form a = b (provided in brackets), and replaces all instances of a in the proof objective with b. Almost everything in Lean can be used as an equivalence:

  • Given an equivalence have eq: ABC = DEF := ..., rw [eq] replaces each instance of ABC with DEF
  • Given a known fact have fact: ABC = True := ..., rw [fact] replaces each instance of ABC with True
  • Given a function definition def xyz (...) := ..., rw [xyz] inlines each invocation of xyz

Our proof objective involves a call to squadStrategy, so we can inline that function by applying rw [squadStrategy]. Because our current strategy is to always take one dragon, this rewrite simply replaces (squadStrategy green_dragons) with 1, leaving 1 > 0 as the rewritten proof objective. When we have a mathematical expression that does not involve any variables, we can use the decide tactic, which transforms the expression into the result of evaluating it. This turns 1 > 0 into True, so the proof goal is now True and we are done!

have decreasing: (squadStrategy green_dragons) > 0 := by
  -- ⊢ (squadStrategy green_dragons) > 0
  rw [squadStrategy]

  -- ⊢ 1 > 0
  decide

  -- proof complete!

Our Lean program now typechecks and does not have any sorry statements, so we know all our proofs are valid! Before we move on to strategizing, let us test out the simulator we have written. Functions in Lean can be executed using the #eval statement. We can use this to check the behavior of our simulator in some basic cases:

#eval (squadStrategy 10) -- 1
#eval (squadWins 1) -- True
#eval (squadWins 2) -- False
#eval (hackerWins 1) -- True

-- there are 14 green dragons in the episode
#eval (squadWins 14) -- False

The last result is particularly worrying: the squad might not win the game if Hacker plays his best! It’s time to strategize...

Forming a Strategy

As the episode continues, the squad takes some time to think about a proper strategy for beating Hacker. The monk recommends that they first experiment with games involving a smaller number of dragons, a wise strategy for any math or computer science problem! As the squad explores the effect of different choices, they land on an exciting observation.

If one player is in a state green_dragonsmod4=0\mathit{green\_dragons} \bmod 4 = 0, then regardless of the move they make, the other player can force them back into that state. This pattern can then continue until the first player is in state green_dragons=0\mathit{green\_dragons} = 0, at which point they have no choice but to take the red dragon! The squad decides to call this state a "poison number"666. Note that in the episode, all these numbers are offset by one since the squad includes the red dragon. Note that in the episode, all these numbers are offset by one since the squad includes the red dragon. .

This points us to a simple strategy. The squad always makes a move that forces Hacker into a poison number by taking green_dragonsmod4\mathit{green\_dragons} \bmod{4} dragons. If the squad themselves are at a poison number, they just give up and make the arbitrary move of taking one dragon, since Hacker has already won the game. Let’s update our strategy in Lean:

def squadStrategy (green_dragons: Nat): Nat :=
  if green_dragons % 4 = 0 then
    1
  else
    (green_dragons % 4)

We will also have to update our proof for the decreasing clause, since the proof state after rw [squadStrategy] will be different. After that proof step, the goal will include the if-else from the function:

have decreasing: (squadStrategy green_dragons) > 0 := by
  -- ⊢ squadStrategy green_dragons > 0
  rw [squadStrategy]
  
  -- ⊢ (if green_dragons % 4 = 0 then 1 else green_dragons % 4) > 0

In Lean, one way to prove facts about an if-else is to use the split tactic. It splits the if-else into two subgoals, each of which gain the condition as a known fact and have the body as the proof objective. In the if branch, the proof is trivial since the body is just 1 (the decide tactic can then evaluate 1 > 0 => true).

In the else branch, we gain the fact that green_dragons % 4 ≠ 0 and need to prove that green_dragons % 4 > 0. To do this, we use the omega tactic, which can automatically solve proofs involving simple inequalities or modular arithmetic777. Because green_dragons is in N\mathbb{N}, when know that it must not be negative. The else branch tells us that green_dragons % 4 ≠ 0, so the only option left is for green_dragons % 4 > 0. Because green_dragons is in N\mathbb{N}, when know that it must not be negative. The else branch tells us that green_dragons % 4 ≠ 0, so the only option left is for green_dragons % 4 > 0. . Under the hood, omega simply generates more Lean using fancy algorithms to pick the right proof steps888. This is critical for correctness. The omega tactic is relatively new, and its algorithm could totally have bugs. But Lean does not assume that the internal proof steps generated by omega are valid; it still checks every line of (generated) Lean. This is critical for correctness. The omega tactic is relatively new, and its algorithm could totally have bugs. But Lean does not assume that the internal proof steps generated by omega are valid; it still checks every line of (generated) Lean. . While we could write out these steps by hand, doing so is quite tedious!

have decreasing: (squadStrategy green_dragons) > 0 := by
  -- ⊢ squadStrategy green_dragons > 0
  rw [squadStrategy]
  
  -- ⊢ (if green_dragons % 4 = 0 then 1 else green_dragons % 4) > 0
  split
  . -- h✝ : green_dragons % 4 = 0
    -- ⊢ 1 > 0
    decide
  . -- h✝ : ¬green_dragons % 4 = 0
    -- ⊢ green_dragons % 4 > 0
    omega

A Formal Proof

Now, we are ready to prove that the squad’s strategy works! In Lean, proofs are written using the theorem keyword, which acts like a function but takes in the premises of the proofs and returns a type capturing what is being proven.

It is helpful to work backwards when writing mathematical proofs, starting with a high-level strategy with holes for lower-level proof details. This is particularly powerful in Lean, since the type checker will make sure that our plan actually fits together999. Of course, the proof holes will be unchecked. Like regular math, it is possible to have holes we fail to prove and force us to backtrack. Of course, the proof holes will be unchecked. Like regular math, it is possible to have holes we fail to prove and force us to backtrack. . Let us begin with our final goal: if the squad starts from a state that is not a poison number, then with their strategy they will always defeat Hacker.

The first step is to formally write down the goal101010. This is a key part of AI proof systems like AlphaProof called autoformalization. This is a key part of AI proof systems like AlphaProof called autoformalization. . Since the proof will be general for any number of green_dragons, we take that as a parameter. We also take in the precondition (that the state is not a poison number) as a parameter named h. The return type of the theorem is the logical statement we will prove.

def isPoisonNumber (green_dragons: Nat): Bool :=
  green_dragons % 4 = 0

theorem non_poison_squad_win (green_dragons: Nat) (h: ¬ isPoisonNumber green_dragons): squadWins green_dragons := by
  sorry

The first proof step is to expand the function invocations in both the input and output expressions, so that we can manipulate the underlying math in our proof. By default, tactics apply to the proof goal (), but we can also apply them to known facts by adding at h at the end, where h is the fact we want to transform. So we can use rw tactic to inline the isPoisonNumber call in h:

-- h : ¬isPoisonNumber green_dragons = true
rw [isPoisonNumber] at h

-- h : ¬decide (green_dragons % 4 = 0) = true

Hmm, the resulting proof state is a bit messy. The rw tactic does what it says and nothing more: apply the equivalences. This often111111. You might be wondering why we didn’t run into this earlier. This time, h includes additional math around the function call, so the rewrite ends up exposing some Lean machinery. Earlier, we were only dealing with pure function calls which expand more cleanly. You might be wondering why we didn’t run into this earlier. This time, h includes additional math around the function call, so the rewrite ends up exposing some Lean machinery. Earlier, we were only dealing with pure function calls which expand more cleanly. results in some "cruft" at the edges, such as the decide and = true, which can get in the way of further proof steps. We could apply further rewrites to simplify the fact, but that would be quite tedious.

For such situations, Lean provides the simp tactic, which repeatedly applies rewrites from a pool of standard theorems to simplify the expression to a canonical form121212. Lean automatically makes sure that these repeated rewrites do not result in an infinite loop. The general idea of repeatedly applying rewrites is closely related to the e-graph, a data structure for efficiently capturing the space of rewritten expressions that I (briefly) worked on at Berkeley. Lean automatically makes sure that these repeated rewrites do not result in an infinite loop. The general idea of repeatedly applying rewrites is closely related to the e-graph, a data structure for efficiently capturing the space of rewritten expressions that I (briefly) worked on at Berkeley. . We can invoke simp as a tactic without any arguments, or pass in additional knowledge that should added to the set of available rewrites. A common pattern is to use simp [...] as a supercharged alternative to rw [...] since it applies the rewrites and simplifies using common theorems. If we replace the rw above with simp, we get a much cleaner result:

-- h : ¬isPoisonNumber green_dragons = true
simp [isPoisonNumber] at h

-- h : ¬green_dragons % 4 = 0

We can then similarly simplify the proof objective:

-- ⊢ squadWins green_dragons = true
simp [squadWins, squadStrategy]

-- ⊢ ¬green_dragons = 0 ∧ hackerWins (green_dragons - if green_dragons % 4 = 0 then 1 else green_dragons % 4) = false

Next, we notice that our proof objective has an if-else expression, but that our premise h means that we will always take the else branch. We can use the simp tactic again, this time passing in the locally known fact h to eliminate the conditional:

-- h : ¬green_dragons % 4 = 0
-- ⊢ ¬green_dragons = 0 ∧ hackerWins (green_dragons - if green_dragons % 4 = 0 then 1 else green_dragons % 4) = false
simp [h]

-- ⊢ ¬green_dragons = 0 ∧ hackerWins (green_dragons - green_dragons % 4) = false

After applying several tactics, it is easy to lose track of the meaning of the current proof goal. Let us take a step back and parse the objective. We need to prove two things:

  1. The squad is not already in a state where they have lost (¬green_dragons = 0)
  2. After taking their turn (green_dragons - green_dragons % 4), Hacker cannot win from that state (hackerWins ... = false)

Proving that Hacker cannot win is going to be quite involved, so let’s write down a theorem statement and move on. This theorem will prove that if Hacker is at a poison number, then he will not win. In Lean:

theorem poison_hacker_loses (green_dragons: Nat) (h: isPoisonNumber green_dragons): ¬ hackerWins green_dragons := by
  sorry

We can use this (not-yet-proved) theorem to complete non_poison_squad_win. Earlier, we used have to perform a local proof. You can also use have to store the result of applying a theorem! Invoking a theorem uses the same syntax as invoking a function. To fulfill the parameter for h in poison_hacker_loses, we need to prove that fact! To create an inline proof, we use the by keyword to open a subproof with a new objective (the type of the parameter with substitutions):

have hackerLoses := poison_hacker_loses (green_dragons - (green_dragons % 4)) (by
  -- ⊢ isPoisonNumber (green_dragons - green_dragons % 4) = true
  sorry
)

Let’s write out the inline proof. As usual, a good first step is to expand any definitions using the simp tactic. We can then finish the proof by using omega since the remaining objective is true by some simple modular arithmetic.

have hackerLoses := poison_hacker_loses (green_dragons - (green_dragons % 4)) (by
  -- ⊢ isPoisonNumber (green_dragons - green_dragons % 4) = true
  simp [isPoisonNumber]
  
  -- ⊢ (green_dragons - green_dragons % 4) % 4 = 0
  omega
)

After this proof is done, we will have added a fact hackerLoses: ¬hackerWins (green_dragons - green_dragons % 4) = true . This matches a the second half of our proof goal, so we can eliminate that half with simp [hackerLoses]. For the remaining objective, we know that h : ¬green_dragons % 4 = 0 so omega can automatically finish the proof.

-- h : ¬green_dragons % 4 = 0
-- hackerLoses : ¬hackerWins (green_dragons - green_dragons % 4) = true
-- ⊢ ¬green_dragons = 0 ∧ hackerWins (green_dragons - green_dragons % 4) = false
simp [hackerLoses]

-- ⊢ ¬green_dragons = 0
omega

Amazing! We have completed the entire proof for non_poison_squad_win, which is our top-level proof objective! But we are not done quite yet. Remember that we used sorry in poison_hacker_loses, so our proof is incomplete. We need to focus on that theorem next, but at least we know that the building blocks of the proof fit together!

Time For Induction

So far, our proof has been a bit of a nothing burger. We’ve strung together some building blocks and proofs for simple cases, but have not reasoned about the iterative nature of this game. Like most of computer science, reasoning about iteration requires induction, which sets up a recursive proof with a base case and an inductive step that uses knowledge of smaller cases to prove a larger one.

Before we apply induction, let’s first clean up the proof objective with simp:

theorem poison_hacker_loses (green_dragons: Nat) (h: isPoisonNumber green_dragons): ¬ hackerWins green_dragons := by
  simp
  -- green_dragons : ℕ
  -- h : isPoisonNumber green_dragons = true
  -- ⊢ hackerWins green_dragons = false

Now, we will use the induction tactic to set up an inductive proof. This tactic requires us to specify how we will induct; we use Nat.strongRecOn to set up strong induction, which lets us use the proofs for all smaller cases rather than just the previous one.

induction green_dragons using Nat.strongRecOn with
| _ green_dragons hd =>
  -- hd : ∀ (m : ℕ), m < green_dragons → isPoisonNumber m = true → hackerWins m = false

Applying (strong) induction gives us a new fact hd: the inductive hypothesis131313. Note that we can omit the base case because Lean can prove it automatically. Note that we can omit the base case because Lean can prove it automatically. . It says that for any smaller number of green dragons m (∀ (m : ℕ), m < green_dragons), we know that if h is true for that smaller number (isPoisonNumber m = true), then Hacker does not win starting from that smaller number (hackerWins m = false).

As always, we’ll start by expanding the facts and goals we’re dealing with. We use the simp tactic to expand isPoisonNumber in h and hackerWins in our proof objective. This gives us some lower-level expressions to manipulate:

-- h : isPoisonNumber green_dragons = true
-- ⊢ hackerWins green_dragons = false

simp [isPoisonNumber] at h
simp [hackerWins]

-- h : green_dragons % 4 = 0
-- ⊢ ¬green_dragons = 0 →
--     squadWins (green_dragons - 3) = true ∧ squadWins (green_dragons - 2) = true ∧ squadWins (green_dragons - 1) = true

You’ll notice that the proof objective is an implication — we need to prove that if ¬green_dragons = 0 then no matter how many dragons Hacker takes (3, 2, or 1), the squad still wins. When dealing with implications in Lean, a useful tactic is intro, which adds the left-hand side to our known facts (a✝) and changes the proof objective to the right hand side.

intro
-- a✝ : ¬green_dragons = 0
-- ⊢ squadWins (green_dragons - 3) = true ∧ squadWins (green_dragons - 2) = true ∧ squadWins (green_dragons - 1) = true

Recall that h says that green_dragons % 4 = 0 and our new fact a✝ says that green_dragons != 0. This means we can filter out the simple cases where green_dragons = 0 (conflicts with a✝) or green_dragons = 1 | 2 | 3 (conflicts with h). Then, we are left with the case that there must be some smaller number next_poison such that green_dragons = next_poison + 4 . This new variable next_poison represents the next poison number that the squad will force Hacker back to.

We can split the proof into these cases with the match tactic, which allows us to pattern match a variable into several variants. Lean will verify that these variants are exhaustive, which means that they cover all possible values for green_dragons.

match green_dragons with
  | 0 | 1 | 2 | 3 => contradiction -- can be elided by Lean, but nice to be explicit
  | next_poison + 4 =>

When we pattern match green_dragons with next_poison + 4, Lean will automatically rewrite all our facts and the proof objective to replace instances of green_dragons with this new pattern. We can immediately use simp to clean up some of the arithmetic:

| next_poison + 4 =>
  -- hd : ∀ (m : ℕ), m < next_poison + 4 → isPoisonNumber m = true → hackerWins m = false
  -- h : (next_poison + 4) % 4 = 0
  -- a✝ : ¬next_poison + 4 = 0
  -- ⊢ squadWins (next_poison + 4 - 3) = true ∧ squadWins (next_poison + 4 - 2) = true ∧ squadWins (next_poison + 4 - 1) = true
  simp

  -- ⊢ squadWins (next_poison + 1) = true ∧ squadWins (next_poison + 2) = true ∧ squadWins (next_poison + 3) = true

It is easy to get lost applying tactics towards a Lean goal. Let’s read this new proof objective to understand what it means. We defined next_poison as the next poison number that the squad wants to force Hacker to. First, Hacker will make an (unknown) choice of how many dragons to take, which will leave the squad with either next_poison + 1 (Hacker takes 3), next_poison + 2 (Hacker takes 2), or next_poison + 3 (Hacker takes 1). We need to prove that the squad will win when starting from this state.

Intuitively, we need to show that the squad’s strategy will result in Hacker being left with next_poison dragons in all three cases. The first step is to expand the definition of squadWins so that we see the invocation of squadStrategy, and then to expand squadStrategy so we can see the underlying math:

simp [squadWins, squadStrategy]
-- ⊢ hackerWins (next_poison + 1 - if (next_poison + 1) % 4 = 0 then 1 else (next_poison + 1) % 4) = false ∧
--     hackerWins (next_poison + 2 - if (next_poison + 2) % 4 = 0 then 1 else (next_poison + 2) % 4) = false ∧
--     hackerWins (next_poison + 3 - if (next_poison + 3) % 4 = 0 then 1 else (next_poison + 3) % 4) = false

Our objective now requires us to show that in all three cases of Hacker’s possible decision, after the squad takes dragons according to their strategy Hacker cannot win (note the sneaky = false at the end of each line).

Modular Arithmetic and a Theorem

If you’ve recently spent time doing discrete math, you’re probably shouting the answer at this point. We can finish our proof using modular arithmetic, which will help us reason about the various expressions involving % 4.

You’ll notice that there is a very common pattern throughout the proof objective. We see expressions of the form (next_poison + k) % 4, where k < 4. We also know from h that (next_poison + 4) % 4 = 0. We can simplify this with a few steps to next_poison % 4 = 0, and then show that (next_poison + k) % 4 = k . This would simplify the objective a lot!

After digging through Mathlib, Lean's massive library of theorems across all areas of math, you might come up disappointed that there are no matches for such a theorem141414. It’s possible I missed it! Searching Mathlib is pretty tough. It’s possible I missed it! Searching Mathlib is pretty tough. . But no worry, we can write our own!

We’ll start by writing out the inputs to the theorem. To make things a bit extra fun, instead of hardcoding the % 4, we’ll have the proof be general for any % n. In the curly braces, we place the numerical inputs to the theorem (this allows Lean to infer their values when we apply the theorem).

theorem mod_zero_plus_k { x k n: Nat } (k_lt_n: k < n) (x_congr_zero: x % n = 0): (x + k) % n = k := by

Let’s map the statements above to the theorem inputs:

  • k < 4k_lt_n: k < n
  • next_poison % 4 = 0x_congr_zero: x % n = 0
  • (next_poison + k) % 4 = k(x + k) % n = k

To prove this theorem, we can apply a few rewrites. First, we apply Nat.add_mod, a theorem from Lean’s standard library, to transform the x into the form x % n so that we can apply x_congr_zero: x % n = 0 to simplify it. Then, the proof goal exactly matches another standard library theorem, Nat.mod_eq_of_lt, which says that a < b => a % b = a. Step by step:

-- x k n : ℕ
-- k_lt_n : k < n
-- x_congr_zero : x % n = 0
-- ⊢ (x + k) % n = k

rw [
  Nat.add_mod,
  -- ⊢ (x % n + k % n) % n = k
  
  x_congr_zero
  -- ⊢ (0 % n + k % n) % n = k
]

simp
-- ⊢ k % n = k

exact Nat.mod_eq_of_lt k_lt_n

Finishing the Proof

With this theorem, we are very close to being done! First, we need to use h : (next_poison + 4) % 4 = 0 to prove that next_poison % 4 = 0. This is simple enough that omega can auto-magically solve the proof:

have next_poison_mod_4: next_poison % 4 = 0 := by
  -- h : (next_poison + 4) % 4 = 0
  -- ⊢ next_poison % 4 = 0
  omega

Then, we can apply three instances of our earlier theorem: one for each of next_poison + 1, next_poison + 2 and next_poison + 3. When applying a rewrite that has some inferred (curly braces) parameters, Lean will pick the first place the rewrite can be applied and infer the arguments based on that. Each application of the theorem also requires the k < 4 proof, which we can do inline using by decide since k will be a constant 1, 2, 3 in each instance:

-- ⊢ hackerWins (next_poison + 1 - if (next_poison + 1) % 4 = 0 then 1 else (next_poison + 1) % 4) = false ∧
--     hackerWins (next_poison + 2 - if (next_poison + 2) % 4 = 0 then 1 else (next_poison + 2) % 4) = false ∧
--     hackerWins (next_poison + 3 - if (next_poison + 3) % 4 = 0 then 1 else (next_poison + 3) % 4) = false

rw [
  mod_zero_plus_k (by decide) next_poison_mod_4,
  -- ⊢ hackerWins (next_poison + 1 - if 1 = 0 then 1 else 1) = false ∧
  --     hackerWins (next_poison + 2 - if (next_poison + 2) % 4 = 0 then 1 else (next_poison + 2) % 4) = false ∧
  --     hackerWins (next_poison + 3 - if (next_poison + 3) % 4 = 0 then 1 else (next_poison + 3) % 4) = false
  
  mod_zero_plus_k (by decide) next_poison_mod_4,
  -- ⊢ hackerWins (next_poison + 1 - if 1 = 0 then 1 else 1) = false ∧
  --     hackerWins (next_poison + 2 - if 2 = 0 then 1 else 2) = false ∧
  --     hackerWins (next_poison + 3 - if (next_poison + 3) % 4 = 0 then 1 else (next_poison + 3) % 4) = false

  mod_zero_plus_k (by decide) next_poison_mod_4
  -- ⊢ hackerWins (next_poison + 1 - if 1 = 0 then 1 else 1) = false ∧
  --     hackerWins (next_poison + 2 - if 2 = 0 then 1 else 2) = false ∧
  --     hackerWins (next_poison + 3 - if 3 = 0 then 1 else 3) = false
]

A single simp dramatically cleans up this proof objective:

simp
-- ⊢ hackerWins next_poison = false

And now we are ready to close the proof. First we need to establish that this value next_poison is indeed a poison number. The next_poison_mod_4 fact helps here:

have nextIsPoison: isPoisonNumber next_poison := by
  -- ⊢ isPoisonNumber next_poison = true
  simp [isPoisonNumber]

  -- ⊢ next_poison % 4 = 0
  exact next_poison_mod_4

And now, we can apply our inductive hypothesis from way back! It tells us that for some poison number that is lower than what we started with (green_dragons), that Hacker cannot win.

exact hd next_poison (by
  -- ⊢ next_poison < next_poison + 4
  simp -- trivial proof using standard theorems
) nextIsPoison

With this final statement, our Lean program compiles with no type errors and more importantly, no sorry. The proof is complete and formally verified! If you’d like to take a look at the complete proof, you can check it out on GitHub: https://github.com/shadaj/cyberchase-lean.

Reflections and Next Steps

If you’ve made it this far, thank you very much for reading through this post! While Lean has a bit of a steep learning curve, its strong guarantees and the elegance of manipulating proofs with code has an awesome payoff. The same tactics and techniques we applied here can be used across a plethora of domains: reasoning about distributed systems (my day job), verifying compiler passes, and even proving Fermat’s Last Theorem151515. Well, not yet. But we’re getting there! Well, not yet. But we’re getting there! !

If there’s one thing you take away from this post, I hope it’s an appreciation for how much math we elide in natural language as "trivial". I don’t mean to say that we should make natural language proofs as verbose as Lean; their only goal is to communicate the interesting ideas. But we take so many facts for granted, and it is easy to make mistakes when applying them. Systems like Lean let us focus on the creativity of math, and reduce the cognitive burden of checking our work.

If you want to learn more Lean, I recommend starting with the Natural Numbers Game, an interactive series of Lean puzzles that you can play in the browser. While documentation on more advanced usage of Lean is a bit sparse, there is a lot of information across the internet and a very friendly community of developers and mathematicians. And practice makes perfect — the first version of this proof was over 400 lines of Lean that has now been whittled down to just 75.

All the best with your adventures in Lean!

Footnotes

  1. This is in contrast to SMT solvers, which are more autonomous but involve complex and sometimes-buggy heuristics. There is ongoing work for SMT solvers to emit their results in a formal proof language for verification.

  2. And created by my colleagues in the Automated Reasoning Group at AWS.

  3. Evidently, a favorite episode for many mathematicians. I’ve found comments on r/math discussing it

  4. In fact, it is a variant of Nim, a well-studied combinatorial game.

  5. The eagle-eyed will notice this objective is actually unnecessarily strong. Even if the squadStrategy does not decrease the dragons, hackerWins always does so the function will terminate. But proving that requires a bit more trickery with setting up a measure.

  6. Note that in the episode, all these numbers are offset by one since the squad includes the red dragon.

  7. Because green_dragons is in N\mathbb{N}, when know that it must not be negative. The else branch tells us that green_dragons % 4 ≠ 0, so the only option left is for green_dragons % 4 > 0.

  8. This is critical for correctness. The omega tactic is relatively new, and its algorithm could totally have bugs. But Lean does not assume that the internal proof steps generated by omega are valid; it still checks every line of (generated) Lean.

  9. Of course, the proof holes will be unchecked. Like regular math, it is possible to have holes we fail to prove and force us to backtrack.

  10. This is a key part of AI proof systems like AlphaProof called autoformalization.

  11. You might be wondering why we didn’t run into this earlier. This time, h includes additional math around the function call, so the rewrite ends up exposing some Lean machinery. Earlier, we were only dealing with pure function calls which expand more cleanly.

  12. Lean automatically makes sure that these repeated rewrites do not result in an infinite loop. The general idea of repeatedly applying rewrites is closely related to the e-graph, a data structure for efficiently capturing the space of rewritten expressions that I (briefly) worked on at Berkeley.

  13. Note that we can omit the base case because Lean can prove it automatically.

  14. It’s possible I missed it! Searching Mathlib is pretty tough.

  15. Well, not yet. But we’re getting there!