jimmy-yang.dev

Rust, mathematics, cryptography, zero-knowledge proofs

FPT-Sized Zero-Knowledge Proofs for Table Edit Distance

The previous posts established a complete picture: table edit distance is NP-hard, treewidth controls the boundary, polymorphisms explain why, and coordinate descent is the practical workaround. This final post asks a different question: can the hardness be used constructively?

Zero-knowledge proofs let a prover convince a verifier that a statement is true without revealing why. “These two medical databases have edit distance at most 50” — provable without revealing a single patient record. “Two financial reports differ in fewer than 10 cells” — verifiable without exposing the numbers.

The challenge: generic ZKP compilation of NP-hard problems produces exponential-size proofs. The opportunity: bounded treewidth — the same structural property that enables FPT algorithms — also bounds the proof size. This post sketches how.

Coordinate Descent: Engineering Around NP-Hardness

Previous posts established that table edit distance is NP-hard, that treewidth controls the boundary, and that the algebraic root cause is the absence of compositional structure. Theory is clean; reality is messy. Users of tate need to diff tables today, and they need the result in milliseconds, not in the exponential time that exact computation would require.

This post covers tate’s practical answer: coordinate descent — an alternating optimization scheme that exploits the one structural property the NP-hardness result leaves intact: for any fixed column alignment, the row alignment problem is polynomial (it’s 1D LCS).

We prove convergence, characterize the optimality gap, and show how the algorithm maps to tate’s Rust implementation.

Tensor Rank and Higher Dimensions

Post 1 established that table edit distance is NP-hard and that treewidth controls the boundary. Post 2 traced the boundary to its algebraic root: 1D solutions compose (lattice structure), 2D solutions don’t.

Now we ask: what about three dimensions? Four? $d$?

The answer involves a beautiful parallel. The complexity boundary $d = 1$ vs $d \geq 2$ in edit distance maps exactly onto the boundary between matrix rank (polynomial) and tensor rank (NP-hard). This is not a surface analogy — the coupling structure that makes 2D edit distance hard is the same structure that makes tensor decomposition hard.

Polymorphisms: Why Dynamic Programming Works

In the first post, we established that table edit distance is NP-hard for any positive cost model, and that treewidth controls the boundary: treewidth 1 (sequences) is polynomial, unbounded treewidth (tables) is NP-hard, and bounded treewidth (few columns) is FPT.

But treewidth only tells us when the problem is tractable. It doesn’t explain why. Why does the $O(nm)$ DP fill its table correctly for sequences, but no analogous DP exists for tables? What is the structural property that makes one case solvable and the other not?

The answer comes from universal algebra: polymorphisms — operations that combine solutions into new solutions. When polymorphisms exist, the solution space has compositional structure, and dynamic programming can exploit it. When they don’t, no amount of clever algorithm design can compensate.

This is the same principle that underlies the CSP dichotomy theorem (Bulatov 2017, Zhuk 2017), one of the deepest results in theoretical computer science.

On Edit Distance and Treewidth

Edit distance for strings is a solved problem: $O(nm)$ dynamic programming, textbook material. Edit distance for tables is NP-hard. The jump from polynomial to intractable happens the moment you go from one dimension to two.

Why? The answer is a single number: treewidth. The constraint graph underlying a 1D edit distance computation is a path (treewidth 1). The constraint graph for a 2D table is a grid (treewidth $\min(\text{rows}, \text{cols})$). Every complexity result in this post — polynomial for sequences, NP-hard for tables, fixed-parameter tractable when columns are bounded — falls out of this one graph invariant.

This post accompanies tate, a self-contained structured diff library in Rust. I’ll formalize table edit distance, prove it’s a metric, prove it’s NP-hard for any positive cost model (not just a specific one), and identify the treewidth boundary that separates tractable from intractable.