Making a Puzzle Solver Fast Enough to Disappear

A reset button that takes two seconds to reset feels broken. You press it, the board sits there, you press it again, maybe you swear at your phone. This is the problem I spent the last week of development on.

This is a post about performance work on the constraint solver that powers my nonogram app. The short version: three optimizations turned a sluggish generation loop into something imperceptible. The long version is about why that mattered — and what I’d tell anyone approaching a similar problem.

Why there’s a solver at all

Randomly generated nonograms have a nasty property: their solutions aren’t always unique. You can generate a 10×10 board by filling cells at random, compute its hints, and ship it to a player — but the player can deduce a different valid picture that also satisfies the hints. They’ll play “correctly” and the game will tell them they’re wrong.

The fix is to generate candidates and filter them: reject any board whose hints admit more than one valid solution. That filter is a constraint solver. It’s the core of puzzle generation, not a nice-to-have.

My solver uses line-wise constraint propagation — rows and columns each have a list of runs; enumerate the valid placements for each line; intersect them to learn which cells are definitely filled or definitely empty — repeated to a fixed point, then branches on the first unknown cell and recurses. It counts solutions rather than finds one, because we only need to know “exactly one” versus “more than one.” It short-circuits at two.

The problem: generating hard puzzles was slow

The solver worked correctly from day one. It was also annoying. On a 15×15 hard puzzle — high fill density, more ambiguous — the generation loop could chew through a few seconds of browser thread time before finding a board that passed uniqueness. The UI thread blocked. The reset button turned into a lie.

First move, before anything fancy: async yielding. The generator wraps each candidate attempt in await Future.delayed(Duration.zero), which hands the event loop back to the browser long enough to paint a frame. That fixed the visual freeze but not the waiting. You’d press reset, see a spinner, and still wait two seconds. Time to actually make the solver faster.

Optimization 1: cache the text measurements

Here’s an embarrassing one. Every layout pass was re-running TextPainter measurement on the hint regions — the rows and columns of numbers beside the grid. TextPainter is not cheap. Caching that measurement, keyed by hint content and text scaler, saved a few milliseconds per frame on hint-heavy boards.

This isn’t technically solver work, but it’s in the same user-visible chain. If your perf fix makes the solver faster while the layout is spending 5ms re-measuring the same text every build, you haven’t fixed the problem. Profile the whole path, not just the interesting part.

Optimization 2: reuse line buffers

The solver’s hot loop propagates constraints line by line. Each iteration enumerates valid placements for a row or column, finds their intersection, and writes the result back. The old code allocated a fresh List<CellState> for every line, every iteration. On a 15×15 grid, with a dozen propagation passes and backtracking, that’s tens of thousands of list allocations inside a single generation attempt.

The fix was straightforward: pre-allocate three buffers at solver construction time — input, output, and an enumeration scratch — sized to the maximum of rows and columns. Reuse them everywhere. The scratch buffer tracks its own logical length with an int rather than Dart’s List.removeLast() bookkeeping, because that shows up in profiles too.

Benchmark on a 10×10 solver, repeated 500 times: ~8ms → ~4.3ms. A 46.5% reduction.

The caveat: buffer reuse is only safe if you’re careful about reentrancy. My solver branches recursively. That means a buffer in use at depth three can’t be touched by depth four. I handled this by scoping the enumeration buffer strictly per call — snapshot, recurse, restore — which segues nicely into the next optimization.

Optimization 3: in-place branching with snapshot/restore

The really juicy win was at the recursive layer. When the propagator runs out of deductions, the solver picks an unknown cell, tries it as filled, recurses, then tries it as empty. The old implementation deep-copied the entire grid — 225 cells on a 15×15, as nested lists — for each branch. On deep searches you’d allocate hundreds of grids just to throw them away when the branch unwound.

The new implementation mutates the grid in place. Before each branch, it copies the grid’s flat cell array into a pre-allocated snapshot buffer. After the branch returns, it restores from the snapshot. No nested-list allocation. No garbage. The grid has one lifetime.

On 15×15 generation: ~5.4s → ~4.2s. A 23.7% reduction.

Combined with buffer reuse, the net effect is that generating a hard 15×15 now feels instant. The reset button responds. The lie is gone.

The deadline sentinel

One detail worth sharing. Candidates that genuinely can’t be solved in reasonable time need a timeout. The naive implementation would throw a SolverTimeoutException and let the caller catch it. I did something different: the solver carries a _SolverDeadline object with a sticky timedOut boolean. Every recursion point checks it and, if set, returns the solution cap (two) — which naturally causes the uniqueness check (count == 1) to fail and the candidate to be rejected.

No exception overhead. No stack unwinding. The timeout is a sentinel value that short-circuits the recursion on the way out. This is the kind of detail you only land on if you’re thinking about the solver’s control flow, not just its output.

Product thinking on an engineering problem

The point of this work isn’t the percentages. It’s that “the puzzle generator was slow” is a product problem disguised as an engineering problem. The player doesn’t see the solver. They see a button that doesn’t respond.

The right question isn’t “can I make the solver faster?” It’s “what does the player see while the solver is running?” Those are the same question when the fix is easy. When the fix is hard, sometimes the right answer is a better loading state instead of a faster solver. I did both — async yielding plus a dimmed board and a spinner for the cases where generation is still measurable — and then I made the solver fast enough that the loading state barely appears.

There’s a version of this work that stops at async yielding. “It doesn’t block the UI anymore.” That’s a failure to finish the job. The reset button still feels wrong, even if the UI still paints. You haven’t shipped the feeling.

A note on collaborating with Claude on this

Both of the perf commits that made the difference are co-authored with Claude Opus 4.6. I want to describe the collaboration, because it’s easy to round off to “AI wrote the code.”

I framed the problem: the generator is slow, profiling says allocation, I want buffer reuse and in-place branching. Claude wrote most of the buffer management. I reviewed the diffs, ran the benchmarks, argued about the deadline sentinel pattern — I wanted it, the first draft used exceptions, I pushed back — and checked for reentrancy bugs in the shared buffers.

What I kept: the problem definition, the solver contract, the design pattern choices, the benchmark review, and the “no, don’t do it that way, here’s why” moments. What I delegated: the typing and the mechanical transformation of one data layout into another.

This is what AI-assisted senior engineering actually looks like in practice. Not “please generate my code.” Not “I write everything myself and use AI for autocomplete.” Something in between, where the human stays in the loop on architecture and design and lets the model carry the mechanical load. The output is faster than either alone, and the engineer ends up spending their attention on the decisions that need it.

What’s next

The solver is fast enough that I’ve stopped thinking about it, which is exactly where you want performance work to land — the point where it disappears from your attention. If I needed the next win I’d look at parallelizing candidate generation across web workers. I don’t need the next win. On to other bugs.