形式化定理证明,又有新范式!
阶跃星辰正式发布并开源了形式化定理证明大模型:StepFun-Prover-Preview-7B和StepFun-Prover-Preview-32B。
StepFun-Prover采用基于环境反馈的强化学习训练流程,能像人类一样在推理过程中通过与环境的实时交互逐步修正和完善形式化证明。
△不同模型在基准测试集 miniF2F-test 上的表现
两阶段监督微调(Two-stage SFT)
阶跃星辰团队采用了分阶段微调策略,具体可分为两步:
这一步训练出的模型已经获得了使用工具的基本能力。
工具集成强化学习(Tool-integrated RL)
阶跃星辰团队使用GRPO进行后续强化学习训练,让模型在生成输出过程中获得如下能力:
这一套流程,允许模型像程序员一样调试代码——发现错误,反思,修正错误,直到证明成立。
奖励函数也设计得非常直观:对于最终模型确定的答案,REPL验证通过就给1分,否则0分。
RL与SFT的迭代循环优化
阶跃星辰团队采用了“RL-SFT-RL”的迭代优化方法,其关键步骤如下:
这一方法让模型逐步攻克更难的推理任务,真正实现越练越强。
StepFun-Prover-Preview-32B在基准测试集miniF2F-test上的pass@1通过率达70.0%,超越所有已知模型4%+!
StepFun-Prover-Preview-7B的pass@1通过率也已超过DeepSeek-Prover-V2-671B和Kimina-Prover-72B。
△不同模型在miniF2F-test上的表现
案例 1:主动去除形式化证明中的冗余步骤
<sketch>
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_numbertheory_247 (n : ℕ) (h₀ : 3 * n % 11 = 2) : n % 11 = 8 := by
have h1 : n % 11 = 0 ∨ n % 11 = 1 ∨ n % 11 = 2 ∨ n % 11 = 3 ∨ n % 11 = 4 ∨ n % 11 = 5 ∨ n % 11 = 6 ∨ n % 11 = 7 ∨ n % 11 = 8 ∨ n % 11 = 9 ∨ n % 11 = 10 := by
omega
rcases h1 with (h | h | h | h | h | h | h | h | h | h | h) <;> simp [Nat.mul_mod, h] at h₀ ⊢
<;> omega
</sketch>
<REPL>
{"messages": [{"severity": "warning", "pos": {"line": 11, "column": 6}, "endPos": {"line": 11, "column": 11}, "data": "this tactic is never executed\nnote: this linter can be disabled with `set_option linter.unreachableTactic false`"}, {"severity": "warning", "pos": {"line": 11, "column": 6}, "endPos": {"line": 11, "column": 11}, "data": "'omega' tactic does nothing\nnote: this linter can be disabled with `set_option linter.unusedTactic false`"}]}
</REPL>
Let me review what the REPL is telling us and modify my strategy.
The warning says that the `omega` tactic is never executed and does nothing. This suggests that the `simp` tactic with the given hypotheses and `omega` was able to solve the goals completely, so the `omega` tactic wasn't needed.
Let me try a simpler approach:
<sketch>
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_numbertheory_247 (n : ℕ) (h₀ : 3 * n % 11 = 2) : n % 11 = 8 := by
have h1 : n % 11 = 0 ∨ n % 11 = 1 ∨ n % 11 = 2 ∨ n % 11 = 3 ∨ n % 11 = 4 ∨ n % 11 = 5 ∨ n % 11 = 6 ∨ n % 11 = 7 ∨ n % 11 = 8 ∨ n % 11 = 9 ∨ n % 11 = 10 := by
omega
rcases h1 with (h | h | h | h | h | h | h | h | h | h | h) <;> simp [Nat.mul_mod, h] at h₀ ⊢
</sketch>
案例 2:根据超时反馈调整形式化证明结构
<sketch>
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_numbertheory_277 (m n : ℕ) (h₀ : Nat.gcd m n = 6) (h₁ : Nat.lcm m n = 126) :
60 ≤ m + n := by
have h2 : m * n = 756 := by
calc
m * n = Nat.gcd m n * Nat.lcm m n := by rw [Nat.gcd_mul_lcm]
_ = 6 * 126 := by rw [h₀, h₁]
_ = 756 := by norm_num
have h3 : m ∣ 756 := by
use n
all_goals nlinarith
have h4 : n ∣ 756 := by
use m
all_goals nlinarith
have h5 : m ≤ 756 := by
exact Nat.le_of_dvd (by norm_num) h3
have h6 : n ≤ 756 := by
exact Nat.le_of_dvd (by norm_num) h4
interval_cases m <;> try {
interval_cases n <;> norm_num [Nat.gcd, Nat.lcm] at h₀ h₁ ⊢
<;> omega
}
</sketch>
<REPL>
"repl cannot return status in 60.47944927215576 seconds!"
</REPL>
The REPL feedback provides valuable insights - let me incorporate these and revise.
The error "repl cannot return status in 60.47944927215576 seconds!" indicates that the proof is taking too long. The `interval_cases` tactic is generating many cases, and we need a more efficient approach.
Let's use a more mathematical approach. We know that $m \cdot n = 756$ and $\gcd(m, n) = 6$. We can use the fact that if $\gcd(m, n) = 6$, then $m = 6a$ and $n = 6b$ where $\gcd(a, b) = 1$ and $a \cdot b = 21$.
The factor pairs of 21 with coprime factors are $(1, 21)$ and $(3, 7)$. The smallest sum is $1 + 21 = 22$, but we need to check if this gives us the minimal $m + n$.
Wait, actually, we need to consider all factor pairs and find the minimal sum. Let's compute:
For $(1, 21)$: $m = 6 \cdot 1 = 6$, $n = 6 \cdot 21 = 126$, sum = 132
For $(3, 7)$: $m = 6 \cdot 3 = 18$, $n = 6 \cdot 7 = 42$, sum = 60
For $(7, 3)$: $m = 6 \cdot 7 = 42$, $n = 6 \cdot 3 = 18$, sum = 60
For $(21, 1)$: $m = 6 \cdot 21 = 126$, $n = 6 \cdot 1 = 6$, sum = 132
So the minimal sum is 60. Therefore, we need to show that $m + n \geq 60$.
Let's try a more direct approach using the properties of gcd and lcm:
<sketch>
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_numbertheory_277 (m n : ℕ) (h₀ : Nat.gcd m n = 6) (h₁ : Nat.lcm m n = 126) :
60 ≤ m + n := by
have h2 : m * n = 756 := by
calc
m * n = Nat.gcd m n * Nat.lcm m n := by rw [Nat.gcd_mul_lcm]
_ = 6 * 126 := by rw [h₀, h₁]
_ = 756 := by norm_num
have h3 : m ≤ 756 := by
have : m ∣ 756 := by
use n
all_goals nlinarith
exact Nat.le_of_dvd (by norm_num) this
have h4 : n ≤ 756 := by
have : n ∣ 756 := by
use m
all_goals nlinarith
exact Nat.le_of_dvd (by norm_num) this
-- We know that m and n are at most 756, but we need a better bound
-- Since gcd(m, n) = 6, both m and n are multiples of 6
have h5 : 6 ∣ m := by
have : 6 ∣ m := by
have h6 : Nat.gcd m n ∣ m := by
exact Nat.gcd_dvd_left m n
rw [h₀] at h6
exact h6
exact this
have h6 : 6 ∣ n := by
have : 6 ∣ n := by
have h7 : Nat.gcd m n ∣ n := by
exact Nat.gcd_dvd_right m n
rw [h₀] at h7
exact h7
exact this
-- So m = 6k and n = 6l for some k, l
-- We also know that lcm(m, n) = 126
-- Since m = 6k and n = 6l, we have lcm(6k, 6l) = 6 * lcm(k, l) = 126
-- Therefore, lcm(k, l) = 21
-- We need to find the minimal m + n = 6(k + l) where lcm(k, l) = 21 and gcd(k, l) = 1
-- The factor pairs of 21 with coprime factors are (1, 21) and (3, 7)
-- The minimal sum is 1 + 21 = 22, so m + n = 6 * 22 = 132
-- But wait, that's not right. Let's recalculate...
-- Actually, we need pairs (k, l) such that lcm(k, l) = 21
-- The pairs are (1, 21), (21, 1), (3, 7), (7, 3)
-- The sums are 22, 22, 10, 10
-- So the minimal sum is 10, giving m + n = 6 * 10 = 60
-- Therefore, we need to show that m + n ≥ 60
-- Let's proceed by contradiction: assume m + n < 60
by_contra h
push_neg at h
-- We need to show this leads to a contradiction
-- Since m and n are both ≥ 6 (as they are multiples of 6 and positive)
have hm : m ≥ 6 := by
have : m > 0 := by
by_contra h'
push_neg at h'
have : m = 0 := by omega
rw [this] at h2
omega
have : m ≥ 6 := by
omega
exact this
have hn : n ≥ 6 := by
have : n > 0 := by
by_contra h'
push_neg at h'
have : n = 0 := by omega
rw [this] at h2
omega
have : n ≥ 6 := by
omega
exact this
-- So m + n ≥ 12, but we need a stronger bound
-- Let's use the fact that if m + n < 60, then both m and n must be less than 60
have h7 : m < 60 := by omega
have h8 : n < 60 := by omega
-- But since m and n are multiples of 6, the possible values are 6, 12, 18, 24, 30, 36, 42, 48
interval_cases m <;> try {
interval_cases n <;> norm_num [Nat.gcd, Nat.lcm] at h₀ h₁ ⊢
<;> omega
}
</sketch>
案例 3:基于环境反馈有效改错
<sketch>
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_algebra_452 (a : ℕ → ℝ) (h₀ : ∀ n, a (n + 2) - a (n + 1) = a (n + 1) - a n)
(h₁ : a 1 = 2 / 3) (h₂ : a 9 = 4 / 5) : a 5 = 11 / 15 := by
have eq4 : a 3 = 2 * a 2 - a 1 := by
specialize h₀ 1
norm_num at h₀ ⊢
linarith
have eq5 : a 4 = 3 * a 2 - 2 * a 1 := by
have eq3 : a 3 = 2 * a 2 - a 1 := eq4
have eq6 : a 4 = 2 * a 3 - a 2 := by
specialize h₀ 2
norm_num at h₀ ⊢
linarith
rw [eq6, eq3]
all_goals linarith
have eq6 : a 5 = 4 * a 2 - 3 * a 1 := by
have eq3 : a 3 = 2 * a 2 - a 1 := eq4
have eq4 : a 4 = 3 * a 2 - 2 * a 1 := eq5
have eq7 : a 5 = 2 * a 4 - a 3 := by
specialize h₀ 3
norm_num at h₀ ⊢
linarith
rw [eq7, eq4, eq3]
all_goals linarith
have eq7 : a 6 = 5 * a 2 - 4 * a 1 := by
have eq5 : a 4 = 3 * a 2 - 2 * a 1 := eq5
have eq8 : a 5 = 4 * a 2 - 3 * a 1 := eq6
have eq9 : a 6 = 2 * a 5 - a 4 := by
specialize h₀ 4
norm_num at h₀ ⊢
linarith
rw [eq9, eq8, eq5]
all_goals linarith
have eq8 : a 7 = 6 * a 2 - 5 * a 1 := by
have eq7 : a 6 = 5 * a 2 - 4 * a 1 := eq7
have eq9 : a 7 = 2 * a 6 - a 5 := by
specialize h₀ 5
norm_num at h₀ ⊢
linarith
rw [eq9, eq7, eq6]
all_goals linarith
have eq9 : a 8 = 7 * a 2 - 6 * a 1 := by
have eq8 : a 7 = 6 * a 2 - 5 * a 1 := eq8
have eq10 : a 8 = 2 * a 7 - a 6 := by
specialize h₀ 6
norm_num at h₀ ⊢
linarith
rw [eq10, eq8, eq7]
all_goals linarith
have eq10 : a 9 = 8 * a 2 - 7 * a 1 := by
have eq9 : a 8 = 7 * a 2 - 6 * a 1 := eq9
have eq11 : a 9 = 2 * a 8 - a 7 := by
specialize h₀ 7
norm_num at h₀ ⊢
linarith
rw [eq11, eq9, eq8]
all_goals linarith
have eq11 : a 2 = 17 / 60 := by
linarith [h₁, h₂, eq10]
have eq12 : a 5 = 11 / 15 := by
rw [eq6]
rw [eq11, h₁]
norm_num
exact eq12
</sketch>
<REPL>
{"messages": [{"severity": "error", "pos": {"line": 64, "column": 4}, "endPos": {"line": 64, "column": 27}, "data": "linarith failed to find a contradiction\ncase h2.h\na : ℕ → ℝ\nh₀ : ∀ (n : ℕ), a (n + 2) - a (n + 1) = a (n + 1) - a n\nh₁ : a 1 = 2 / 3\nh₂ : a 9 = 4 / 5\neq4 : a 3 = 2 * a 2 - a 1\neq5 : a 4 = 3 * a 2 - 2 * a 1\neq6 : a 5 = 4 * a 2 - 3 * a 1\neq7 : a 6 = 5 * a 2 - 4 * a 1\neq8 : a 7 = 6 * a 2 - 5 * a 1\neq9 : a 8 = 7 * a 2 - 6 * a 1\neq10 : a 9 = 8 * a 2 - 7 * a 1\na✝ : 17 / 60 < a 2\n⊢ False failed"}, {"severity": "error", "pos": {"line": 65, "column": 31}, "endPos": {"line": 68, "column": 12}, "data": "unsolved goals\na : ℕ → ℝ\nh₀ : ∀ (n : ℕ), a (n + 2) - a (n + 1) = a (n + 1) - a n\nh₁ : a 1 = 2 / 3\nh₂ : a 9 = 4 / 5\neq4 : a 3 = 2 * a 2 - a 1\neq5 : a 4 = 3 * a 2 - 2 * a 1\neq6 : a 5 = 4 * a 2 - 3 * a 1\neq7 : a 6 = 5 * a 2 - 4 * a 1\neq8 : a 7 = 6 * a 2 - 5 * a 1\neq9 : a 8 = 7 * a 2 - 6 * a 1\neq10 : a 9 = 8 * a 2 - 7 * a 1\neq11 : a 2 = 17 / 60\n⊢ False"}]}
</REPL>
Let's review what the REPL is telling us and modify our strategy.
The error message shows that `linarith` failed to find a contradiction when trying to prove `a 2 = 17 / 60`. This is strange because our mathematical derivation showed that $a_2 = \frac{17}{60}$.
Let's check our mathematical derivation:
From $a_9 = 8a_2 - 7a_1$ and $a_9 = \frac{4}{5}$, $a_1 = \frac{2}{3}$:
$\frac{4}{5} = 8a_2 - 7 \cdot \frac{2}{3}$
$\frac{4}{5} = 8a_2 - \frac{14}{3}$
$8a_2 = \frac{4}{5} + \frac{14}{3} = \frac{12 + 70}{15} = \frac{82}{15}$
$a_2 = \frac{82}{15 \cdot 8} = \frac{41}{60}$
Wait! That contradicts what I said earlier. Let me recalculate:
Wait: $\frac{4}{5} = 8a_2 - \frac{14}{3}$
So $8a_2 = \frac{4}{5} + \frac{14}{3} = \frac{12 + 70}{15} = \frac{82}{15}$
Therefore $a_2 = \frac{82}{15 \cdot 8} = \frac{41}{60}$
But then $a_5 = 4a_2 - 3a_1 = 4 \cdot \frac{41}{60} - 3 \cdot \frac{2}{3} = \frac{164}{60} - 2 = \frac{164 - 120}{60} = \frac{44}{60} = \frac{11}{15}$
So my previous calculation was wrong. The correct value is $a_2 = \frac{41}{60}$, not $\frac{17}{60}$.
Let's fix the proof:
<sketch>
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_algebra_452 (a : ℕ → ℝ) (h₀ : ∀ n, a (n + 2) - a (n + 1) = a (n + 1) - a n)
(h₁ : a 1 = 2 / 3) (h₂ : a 9 = 4 / 5) : a 5 = 11 / 15 := by
have eq4 : a 3 = 2 * a 2 - a 1 := by
specialize h₀ 1
norm_num at h₀ ⊢
linarith
have eq5 : a 4 = 3 * a 2 - 2 * a 1 := by
have eq3 : a 3 = 2 * a 2 - a 1 := eq4
have eq6 : a 4 = 2 * a 3 - a 2 := by
specialize h₀ 2
norm_num at h₀ ⊢
linarith
rw [eq6, eq3]
all_goals linarith
have eq6 : a 5 = 4 * a 2 - 3 * a 1 := by
have eq3 : a 3 = 2 * a 2 - a 1 := eq4
have eq4 : a 4 = 3 * a 2 - 2 * a 1 := eq5
have eq7 : a 5 = 2 * a 4 - a 3 := by
specialize h₀ 3
norm_num at h₀ ⊢
linarith
rw [eq7, eq4, eq3]
all_goals linarith
have eq7 : a 6 = 5 * a 2 - 4 * a 1 := by
have eq5 : a 4 = 3 * a 2 - 2 * a 1 := eq5
have eq8 : a 5 = 4 * a 2 - 3 * a 1 := eq6
have eq9 : a 6 = 2 * a 5 - a 4 := by
specialize h₀ 4
norm_num at h₀ ⊢
linarith
rw [eq9, eq8, eq5]
all_goals linarith
have eq8 : a 7 = 6 * a 2 - 5 * a 1 := by
have eq7 : a 6 = 5 * a 2 - 4 * a 1 := eq7
have eq9 : a 7 = 2 * a 6 - a 5 := by
specialize h₀ 5
norm_num at h₀ ⊢
linarith
rw [eq9, eq7, eq6]
all_goals linarith
have eq9 : a 8 = 7 * a 2 - 6 * a 1 := by
have eq8 : a 7 = 6 * a 2 - 5 * a 1 := eq8
have eq10 : a 8 = 2 * a 7 - a 6 := by
specialize h₀ 6
norm_num at h₀ ⊢
linarith
rw [eq10, eq8, eq7]
all_goals linarith
have eq10 : a 9 = 8 * a 2 - 7 * a 1 := by
have eq9 : a 8 = 7 * a 2 - 6 * a 1 := eq9
have eq11 : a 9 = 2 * a 8 - a 7 := by
specialize h₀ 7
norm_num at h₀ ⊢
linarith
rw [eq11, eq9, eq8]
all_goals linarith
have eq11 : a 2 = 41 / 60 := by
linarith [h₁, h₂, eq10]
have eq12 : a 5 = 11 / 15 := by
rw [eq6]
rw [eq11, h₁]
norm_num
exact eq12
</sketch>
<REPL>
{}
</REPL>
Perfect! The proof now passes without errors. Let's verify our solution once more:
We have $a_1 = \frac{2}{3}$ and $a_9 = \frac{4}{5}$. We've shown that $a_2 = \frac{41}{60}$.
Then, $a_5 = 4a_2 - 3a_1 = 4 \cdot \frac{41}{60} - 3 \cdot \frac{2}{3} = \frac{164}{60} - 2 = \frac{164 - 120}{60} = \frac{44}{60} = \frac{11}{15}$
StepFun-Prover Preview是阶跃星辰团队在形式化证明方向的阶段性成果,团队也将在形式化推理模型方向持续探索。
Github:
https://github.com/stepfun-ai/StepFun-Prover-Preview
Huggingface:
https://huggingface.co/stepfun-ai/StepFun-Prover-Preview-32B
技术报告:
https://arxiv.org/abs/2507.20199
文章来自于微信公众号“量子位”。
【开源免费】DeepBI是一款AI原生的数据分析平台。DeepBI充分利用大语言模型的能力来探索、查询、可视化和共享来自任何数据源的数据。用户可以使用DeepBI洞察数据并做出数据驱动的决策。
项目地址:https://github.com/DeepInsight-AI/DeepBI?tab=readme-ov-file
本地安装:https://www.deepbi.com/
【开源免费】airda(Air Data Agent)是面向数据分析的AI智能体,能够理解数据开发和数据分析需求、根据用户需要让数据可视化。
项目地址:https://github.com/hitsz-ids/airda
【开源免费】XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。它帮助开发者提供一个简单易用的平台,可以对大语言模型(LLM)和多模态图文模型(VLM)进行预训练和轻量级微调。XTuner 支持多种微调算法,如 QLoRA、LoRA 和全量参数微调。
项目地址:https://github.com/InternLM/xtuner