<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://sandboxsinkhole.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://sandboxsinkhole.com/" rel="alternate" type="text/html" /><updated>2026-04-12T16:37:56+00:00</updated><id>https://sandboxsinkhole.com/feed.xml</id><title type="html">Sandbox Sinkhole</title><subtitle>Apps, games, and experiments from the sandbox.</subtitle><entry><title type="html">Building a Nonogram, From 3×3 to TestFlight</title><link href="https://sandboxsinkhole.com/blog/2026/04/11/building-a-nonogram-from-3x3-to-testflight/" rel="alternate" type="text/html" title="Building a Nonogram, From 3×3 to TestFlight" /><published>2026-04-11T00:00:00+00:00</published><updated>2026-04-11T00:00:00+00:00</updated><id>https://sandboxsinkhole.com/blog/2026/04/11/building-a-nonogram-from-3x3-to-testflight</id><content type="html" xml:base="https://sandboxsinkhole.com/blog/2026/04/11/building-a-nonogram-from-3x3-to-testflight/"><![CDATA[<p>I started this project on November 4, 2023 with a 3×3 grid of colored squares. The first commit isn’t a hello-world Flutter app; it already has cell selection, drag-to-fill, a peek button, and the ghost of a game loop. Not because I was showing off. Because I already knew what a nonogram was and didn’t feel like pretending I didn’t.</p>

<p>For those who don’t: a nonogram is a logic puzzle where a grid’s rows and columns each carry a list of numbers telling you how long the runs of filled cells are. You deduce the picture. Think “paint-by-numbers for programmers.”</p>

<p>What followed was two-and-a-half years of an on-and-off solo project. Almost everything interesting happened in the last sixty days. Today the 1.1.0 build went to TestFlight, and I want to write the arc down while it’s still fresh.</p>

<h2 id="the-long-quiet">The long quiet</h2>

<p>Between November 2023 and early 2026, the project lived in a comfortable hole. I added row and column hints, a “five errors and you’re out” lives system, random board generation, a CI pipeline, a victory overlay. The whole game ran out of a ~290-line <code class="language-plaintext highlighter-rouge">main.dart</code>. It worked on my phone. That was enough, because nobody else was playing it.</p>

<p>This is how a lot of side projects die. They achieve “works on my machine” and then they stop.</p>

<h2 id="responsive-layout-was-the-wall">Responsive layout was the wall</h2>

<p>In February 2026 I decided to actually ship this thing, and within a week I was staring at a UI that looked great on a Pixel 6 and collapsed on a tablet. Fixing it one viewport at a time wasn’t going to work.</p>

<p>So I did the move that makes side projects survive: I wrote golden tests for multiple viewports — iPhone SE, iPhone 14 Pro, landscape, iPad, desktop — and let them fail. That’s the moment the project stopped being a toy. Tests aren’t how I prove the code works. They’re how I stop myself from being clever about things I can’t see.</p>

<p>The layout rewrite that followed was the first commit I co-authored with Claude. Not because I couldn’t do it. Because Claude was going to be faster at threading <code class="language-plaintext highlighter-rouge">LayoutBuilder</code> logic through side panels and hint regions, and I wanted to spend my time on decisions, not plumbing.</p>

<h2 id="extracting-widgets-finally">Extracting widgets, finally</h2>

<p>With responsive layout working, I pulled the widgets out of <code class="language-plaintext highlighter-rouge">main.dart</code>. Row hints, column hints, grid cell, toolbar, victory overlay, game-over overlay — each got its own file. This is the kind of refactor that goes on the “I’ll do it when I have time” list and never comes off. Getting it done in February meant the next two months of changes were actually tractable.</p>

<p>Moral: if a refactor is blocking other work, it isn’t “tech debt.” It’s in your critical path, and you should treat it that way.</p>

<h2 id="the-uniqueness-bug">The uniqueness bug</h2>

<p>Here’s one I didn’t see coming. Random boards can have multiple valid solutions. A player could look at a hint, deduce something that felt correct, and be wrong — not because their logic was bad, but because the puzzle was genuinely ambiguous. You play a nonogram trusting that the puzzle has one answer, and this was breaking that contract.</p>

<p>The fix was a real constraint solver. Line-wise propagation, enumeration of valid placements, backtracking with solution counting. The generator now produces a candidate board, runs the solver, and only accepts the board if the solution is unique. If it isn’t, throw it out and try again.</p>

<p>This is the first place in the project where the word “algorithm” felt earned. It’s not tutorial code. There’s branch pruning, cycle detection on propagation, placement caps, and a deadline timeout for candidates that take too long. I wrote a whole <a href="/blog/2026/04/09/making-a-puzzle-solver-fast-enough-to-disappear/">separate post</a> on the performance work the solver needed once it existed.</p>

<h2 id="product-touches-that-arent-in-the-rules">Product touches that aren’t in the rules</h2>

<p>Things I care about that aren’t in a nonogram rule book:</p>

<ul>
  <li>Double-tap to toggle fill/clear mode, with the active mode visible in the unknown cells. No separate fill/clear buttons.</li>
  <li>Ripple animation and a heart-shake when you make a mistake. Games feel different when they punish you with texture.</li>
  <li>A progress spinner and dimmed board while a hard puzzle generates. Nothing worse than a button that silently does nothing for two seconds.</li>
  <li><code class="language-plaintext highlighter-rouge">TextScaler</code> threaded through hint measurement so OS-level font scaling doesn’t clip the numbers. Accessibility is not a feature you bolt on.</li>
  <li>A live timer, because I wanted stats at the end.</li>
</ul>

<p>None of these are senior-engineer showpieces. They’re ten-minute changes that make the game feel like somebody cared. Side projects fail at the last ten percent of polish more often than they fail anywhere else, and the last ten percent is made of things that individually don’t matter.</p>

<h2 id="how-the-ai-collaboration-actually-looks">How the AI collaboration actually looks</h2>

<p>I started co-authoring commits with Claude in February 2026 and the velocity shifted. I want to be specific about what that looked like, because “I use AI” is the least useful thing anyone can say right now.</p>

<ul>
  <li>I kept the design decisions. What the game looks like, what the difficulty tuning means, what the solver’s contract is, what the UI should feel like. Nobody delegates that.</li>
  <li>Claude was faster than me at mechanical refactors, performance micro-optimizations, golden test maintenance, and anything that involved threading a parameter through seven layers.</li>
  <li>I wrote a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> that tells the model how to work in this repo: TDD, red-green, list all test cases in PR descriptions, don’t commit without running <code class="language-plaintext highlighter-rouge">dart format</code>, prefer extraction over comments. It’s a contract, not a ritual.</li>
  <li>The structured commit messages and co-authored tags are real. Every perf commit has a benchmark, because Claude doesn’t forget and I do.</li>
</ul>

<p>This isn’t a person being replaced by a model. It’s a person with a force multiplier who has more time to think and less time to type. The visible output is more commits and tighter work. The invisible output is that I’m making better decisions, because I’m not exhausted from writing the boring parts of them.</p>

<h2 id="what-id-tell-november-2023-me">What I’d tell November 2023 me</h2>

<ol>
  <li>The thing blocking you from shipping isn’t the feature list. It’s the test story.</li>
  <li>Write a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> on day one. Future-you and future-Claude both need it.</li>
  <li>A 3×3 grid is a perfectly good starting point if you already know where it’s going.</li>
  <li>“Just ship it” is advice for people who haven’t fixed their responsive layout yet.</li>
  <li>The refactor that feels like tech debt is probably in the critical path. Stop treating it like a luxury.</li>
</ol>

<p>Version 1.1.0 went to TestFlight today. That’s two and a half years from the first commit. Most of the work happened in the last two months. None of it happened accidentally.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[I started this project on November 4, 2023 with a 3×3 grid of colored squares. The first commit isn’t a hello-world Flutter app; it already has cell selection, drag-to-fill, a peek button, and the ghost of a game loop. Not because I was showing off. Because I already knew what a nonogram was and didn’t feel like pretending I didn’t.]]></summary></entry><entry><title type="html">Making a Puzzle Solver Fast Enough to Disappear</title><link href="https://sandboxsinkhole.com/blog/2026/04/09/making-a-puzzle-solver-fast-enough-to-disappear/" rel="alternate" type="text/html" title="Making a Puzzle Solver Fast Enough to Disappear" /><published>2026-04-09T00:00:00+00:00</published><updated>2026-04-09T00:00:00+00:00</updated><id>https://sandboxsinkhole.com/blog/2026/04/09/making-a-puzzle-solver-fast-enough-to-disappear</id><content type="html" xml:base="https://sandboxsinkhole.com/blog/2026/04/09/making-a-puzzle-solver-fast-enough-to-disappear/"><![CDATA[<p>A reset button that takes two seconds to reset feels broken. You press it, the board sits there, you press it again, maybe you swear at your phone. This is the problem I spent the last week of development on.</p>

<p>This is a post about performance work on the constraint solver that powers my nonogram app. The short version: three optimizations turned a sluggish generation loop into something imperceptible. The long version is about why that mattered — and what I’d tell anyone approaching a similar problem.</p>

<h2 id="why-theres-a-solver-at-all">Why there’s a solver at all</h2>

<p>Randomly generated nonograms have a nasty property: their solutions aren’t always unique. You can generate a 10×10 board by filling cells at random, compute its hints, and ship it to a player — but the player can deduce a <em>different</em> valid picture that also satisfies the hints. They’ll play “correctly” and the game will tell them they’re wrong.</p>

<p>The fix is to generate candidates and filter them: reject any board whose hints admit more than one valid solution. That filter is a constraint solver. It’s the core of puzzle generation, not a nice-to-have.</p>

<p>My solver uses line-wise constraint propagation — rows and columns each have a list of runs; enumerate the valid placements for each line; intersect them to learn which cells are definitely filled or definitely empty — repeated to a fixed point, then branches on the first unknown cell and recurses. It counts solutions rather than finds one, because we only need to know “exactly one” versus “more than one.” It short-circuits at two.</p>

<h2 id="the-problem-generating-hard-puzzles-was-slow">The problem: generating hard puzzles was slow</h2>

<p>The solver worked correctly from day one. It was also annoying. On a 15×15 hard puzzle — high fill density, more ambiguous — the generation loop could chew through a few seconds of browser thread time before finding a board that passed uniqueness. The UI thread blocked. The reset button turned into a lie.</p>

<p>First move, before anything fancy: async yielding. The generator wraps each candidate attempt in <code class="language-plaintext highlighter-rouge">await Future.delayed(Duration.zero)</code>, which hands the event loop back to the browser long enough to paint a frame. That fixed the visual freeze but not the waiting. You’d press reset, see a spinner, and still wait two seconds. Time to actually make the solver faster.</p>

<h2 id="optimization-1-cache-the-text-measurements">Optimization 1: cache the text measurements</h2>

<p>Here’s an embarrassing one. Every layout pass was re-running <code class="language-plaintext highlighter-rouge">TextPainter</code> measurement on the hint regions — the rows and columns of numbers beside the grid. <code class="language-plaintext highlighter-rouge">TextPainter</code> is not cheap. Caching that measurement, keyed by hint content and text scaler, saved a few milliseconds per frame on hint-heavy boards.</p>

<p>This isn’t technically solver work, but it’s in the same user-visible chain. If your perf fix makes the solver faster while the layout is spending 5ms re-measuring the same text every build, you haven’t fixed the problem. Profile the whole path, not just the interesting part.</p>

<h2 id="optimization-2-reuse-line-buffers">Optimization 2: reuse line buffers</h2>

<p>The solver’s hot loop propagates constraints line by line. Each iteration enumerates valid placements for a row or column, finds their intersection, and writes the result back. The old code allocated a fresh <code class="language-plaintext highlighter-rouge">List&lt;CellState&gt;</code> for every line, every iteration. On a 15×15 grid, with a dozen propagation passes and backtracking, that’s tens of thousands of list allocations inside a single generation attempt.</p>

<p>The fix was straightforward: pre-allocate three buffers at solver construction time — input, output, and an enumeration scratch — sized to the maximum of rows and columns. Reuse them everywhere. The scratch buffer tracks its own logical length with an int rather than Dart’s <code class="language-plaintext highlighter-rouge">List.removeLast()</code> bookkeeping, because that shows up in profiles too.</p>

<p>Benchmark on a 10×10 solver, repeated 500 times: ~8ms → ~4.3ms. A 46.5% reduction.</p>

<p>The caveat: buffer reuse is only safe if you’re careful about reentrancy. My solver branches recursively. That means a buffer in use at depth three can’t be touched by depth four. I handled this by scoping the enumeration buffer strictly per call — snapshot, recurse, restore — which segues nicely into the next optimization.</p>

<h2 id="optimization-3-in-place-branching-with-snapshotrestore">Optimization 3: in-place branching with snapshot/restore</h2>

<p>The really juicy win was at the recursive layer. When the propagator runs out of deductions, the solver picks an unknown cell, tries it as filled, recurses, then tries it as empty. The old implementation deep-copied the entire grid — 225 cells on a 15×15, as nested lists — for each branch. On deep searches you’d allocate hundreds of grids just to throw them away when the branch unwound.</p>

<p>The new implementation mutates the grid in place. Before each branch, it copies the grid’s flat cell array into a pre-allocated snapshot buffer. After the branch returns, it restores from the snapshot. No nested-list allocation. No garbage. The grid has one lifetime.</p>

<p>On 15×15 generation: ~5.4s → ~4.2s. A 23.7% reduction.</p>

<p>Combined with buffer reuse, the net effect is that generating a hard 15×15 now feels instant. The reset button responds. The lie is gone.</p>

<h2 id="the-deadline-sentinel">The deadline sentinel</h2>

<p>One detail worth sharing. Candidates that genuinely can’t be solved in reasonable time need a timeout. The naive implementation would throw a <code class="language-plaintext highlighter-rouge">SolverTimeoutException</code> and let the caller catch it. I did something different: the solver carries a <code class="language-plaintext highlighter-rouge">_SolverDeadline</code> object with a sticky <code class="language-plaintext highlighter-rouge">timedOut</code> boolean. Every recursion point checks it and, if set, returns the solution cap (two) — which naturally causes the uniqueness check (<code class="language-plaintext highlighter-rouge">count == 1</code>) to fail and the candidate to be rejected.</p>

<p>No exception overhead. No stack unwinding. The timeout is a sentinel value that short-circuits the recursion on the way out. This is the kind of detail you only land on if you’re thinking about the solver’s control flow, not just its output.</p>

<h2 id="product-thinking-on-an-engineering-problem">Product thinking on an engineering problem</h2>

<p>The point of this work isn’t the percentages. It’s that “the puzzle generator was slow” is a product problem disguised as an engineering problem. The player doesn’t see the solver. They see a button that doesn’t respond.</p>

<p>The right question isn’t “can I make the solver faster?” It’s “what does the player see while the solver is running?” Those are the same question when the fix is easy. When the fix is hard, sometimes the right answer is a better loading state instead of a faster solver. I did both — async yielding plus a dimmed board and a spinner for the cases where generation is still measurable — and then I made the solver fast enough that the loading state barely appears.</p>

<p>There’s a version of this work that stops at async yielding. “It doesn’t block the UI anymore.” That’s a failure to finish the job. The reset button still feels wrong, even if the UI still paints. You haven’t shipped the feeling.</p>

<h2 id="a-note-on-collaborating-with-claude-on-this">A note on collaborating with Claude on this</h2>

<p>Both of the perf commits that made the difference are co-authored with Claude Opus 4.6. I want to describe the collaboration, because it’s easy to round off to “AI wrote the code.”</p>

<p>I framed the problem: the generator is slow, profiling says allocation, I want buffer reuse and in-place branching. Claude wrote most of the buffer management. I reviewed the diffs, ran the benchmarks, argued about the deadline sentinel pattern — I wanted it, the first draft used exceptions, I pushed back — and checked for reentrancy bugs in the shared buffers.</p>

<p>What I kept: the problem definition, the solver contract, the design pattern choices, the benchmark review, and the “no, don’t do it that way, here’s why” moments. What I delegated: the typing and the mechanical transformation of one data layout into another.</p>

<p>This is what AI-assisted senior engineering actually looks like in practice. Not “please generate my code.” Not “I write everything myself and use AI for autocomplete.” Something in between, where the human stays in the loop on architecture and design and lets the model carry the mechanical load. The output is faster than either alone, and the engineer ends up spending their attention on the decisions that need it.</p>

<h2 id="whats-next">What’s next</h2>

<p>The solver is fast enough that I’ve stopped thinking about it, which is exactly where you want performance work to land — the point where it disappears from your attention. If I needed the next win I’d look at parallelizing candidate generation across web workers. I don’t need the next win. On to other bugs.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[A reset button that takes two seconds to reset feels broken. You press it, the board sits there, you press it again, maybe you swear at your phone. This is the problem I spent the last week of development on.]]></summary></entry><entry><title type="html">Shipping a Co-op Card Game in 17 Days</title><link href="https://sandboxsinkhole.com/blog/2026/04/08/shipping-a-coop-card-game-in-17-days/" rel="alternate" type="text/html" title="Shipping a Co-op Card Game in 17 Days" /><published>2026-04-08T00:00:00+00:00</published><updated>2026-04-08T00:00:00+00:00</updated><id>https://sandboxsinkhole.com/blog/2026/04/08/shipping-a-coop-card-game-in-17-days</id><content type="html" xml:base="https://sandboxsinkhole.com/blog/2026/04/08/shipping-a-coop-card-game-in-17-days/"><![CDATA[<p>Four browser windows arranged in a 2×2 grid, each logged in as a different player, silently passing cards to a Cloud Run server and watching the state snap back. No chat. No signaling. Just four Playwright-driven browsers and a hundred cards to get rid of in descending order. When that test passes, the game works.</p>

<p>This is a post about the architecture and shipping of Countdown v2 — a cooperative card game in the vein of <em>The Mind</em>, built as a Dart monorepo with a WebSocket server and a Flutter client. I wrote about <a href="/blog/2026/04/07/why-i-rewrote-countdown/">why I rewrote it from v1</a> in the last post. This one is about what’s in the box.</p>

<h2 id="the-game">The game</h2>

<p><em>The Mind</em> is a card game where you win by silently playing your cards in ascending order, without talking or signaling. Countdown flips the direction — cards play <em>descending</em>, 100 down to 1 — but the feel is the same. You hold a few cards. You stare at your teammates. Someone plays. You breathe. You wait for the tension in your hand that tells you it’s your turn. Either the card plays and the round continues, or it doesn’t, and somebody loses a life.</p>

<p>It’s a game about the texture of cooperative silence. That makes the engineering problem interesting, because the thing I’m shipping isn’t “a working card game.” It’s the feel of four people trusting each other through a browser.</p>

<h2 id="the-monorepo-and-where-logic-lives">The monorepo and where logic lives</h2>

<p>Four packages:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">countdown_core</code></strong> — pure Dart game engine. Deck, hand, player, rules, play results. No I/O, no Flutter, no sockets. Deterministic functions on immutable state.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">countdown_console</code></strong> — CLI bot simulator. Exists to exercise the core without any UI or networking.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">countdown_server</code></strong> — Dart + shelf WebSocket server. Wraps the engine with rooms, connections, and broadcasts.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">countdown_flutter</code></strong> — Flutter client. iOS, Android, macOS, Windows, Linux, web. A dumb renderer of server state.</li>
</ul>

<p>The pivot point is <code class="language-plaintext highlighter-rouge">countdown_core</code>. The server imports it as a path dependency. The client imports it as a path dependency too, but only for types and enums — the actual rule logic stays server-side. There is exactly one place where the rules of the game are expressed, and everything else is a consumer.</p>

<p>This is the architectural insight v1 didn’t have. Game is engine, transport is detail, UI is projection. Hold onto that shape and the code stays honest.</p>

<h2 id="server-authoritative-state">Server-authoritative state</h2>

<p>The server is authoritative. Clients send intents — <code class="language-plaintext highlighter-rouge">create_room</code>, <code class="language-plaintext highlighter-rouge">join_room</code>, <code class="language-plaintext highlighter-rouge">play_card</code>, <code class="language-plaintext highlighter-rouge">vote_card_count</code> — and the server validates them against the engine, mutates state, and broadcasts a <code class="language-plaintext highlighter-rouge">state_update</code> containing the entire relevant snapshot to every connected client. No deltas. No diffs. No ordering guarantees required.</p>

<p>This is a deliberate simplification. Diff-based protocols are smaller but have to solve a lot of hard problems — out-of-order messages, dropped messages, replay logic. Snapshot-based protocols send more bytes and solve none of them. For a 100-card game with fewer than ten players per room, the bytes don’t matter and the simplicity is load-bearing.</p>

<p>Per-player hand visibility is handled at the serialization boundary: each player receives their own hand values; other players’ hands are broadcast as empty arrays. That means a client can’t snoop another player’s hand even if they tamper with the wire — the server never sent it to them in the first place. The trust boundary and the privacy boundary are the same line of code.</p>

<h2 id="reconnection-without-message-queues">Reconnection without message queues</h2>

<p>One problem WebSocket games are notorious for: what happens when a player drops connection mid-game? Do you kill the round? Hold a seat? Queue messages for them? Reconstruct their state?</p>

<p>My answer was: when a player reconnects with their room code and player ID, the server replaces their old sink with the new one. The next broadcast — which always contains the full state — catches them up automatically. No queue. No “missed messages” handling. No state reconstruction.</p>

<p>This works because the snapshot protocol is already idempotent. If I’d chosen a diff protocol, reconnection would be a serious project. Because I chose snapshots, reconnection is a few lines of code. This is the payoff for picking the boring option.</p>

<h2 id="spectator-mode">Spectator mode</h2>

<p>Entering the room code <code class="language-plaintext highlighter-rouge">PILE_VIEWER</code> on a device puts it into spectator mode. It joins the room as a read-only connection — no hand, no controls, just a view of the discard pile, player list, and round number. Put it on a table. Shove a phone in the middle. Four people with phones can all see their hands, and one shared screen shows the common state. It turns a mobile-only game into a table game.</p>

<p>Spectators are a parallel list of connections on the server, separate from players. The broadcast loop iterates players and spectators independently. Adding the feature was small, because the server was already per-connection-aware. That’s what “designed for the right boundary” looks like in practice — a feature you didn’t plan for costs an afternoon instead of a week.</p>

<h2 id="tests-deeper-than-they-look">Tests: deeper than they look</h2>

<p>The test pyramid for a networked game is weird. You need unit tests for rules, but you also need to know the wire protocol is honest, and you also need to know the UI doesn’t deadlock when a round ends, and you also need to know four players can finish a game in real time without the server eating itself.</p>

<p>Here’s what actually runs on every PR:</p>

<table>
  <thead>
    <tr>
      <th>Layer</th>
      <th>Tool</th>
      <th>Coverage</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Game engine</td>
      <td><code class="language-plaintext highlighter-rouge">dart test</code></td>
      <td>Deck, hand, rules, play outcomes</td>
    </tr>
    <tr>
      <td>Bots</td>
      <td><code class="language-plaintext highlighter-rouge">dart test</code></td>
      <td>Optimal and fallible bot policies</td>
    </tr>
    <tr>
      <td>Server</td>
      <td><code class="language-plaintext highlighter-rouge">dart test</code></td>
      <td>Rooms, protocol, reconnection, voting, full games</td>
    </tr>
    <tr>
      <td>Client</td>
      <td><code class="language-plaintext highlighter-rouge">flutter test</code></td>
      <td>Widgets and state machine</td>
    </tr>
    <tr>
      <td>Visual regression</td>
      <td>Golden tests</td>
      <td>Screenshots across viewports and themes</td>
    </tr>
    <tr>
      <td>End to end</td>
      <td>Playwright</td>
      <td>Four real browser windows, full game flow</td>
    </tr>
  </tbody>
</table>

<p>The Playwright suite is the expensive one. It opens four Chromium windows, has them create a room, join it, vote on card counts, play a round, lose a life, play another round, win. Each test runs the real client against the real server over a real WebSocket. When that suite is green, the game works for real.</p>

<p>I wouldn’t recommend this for every project. For a cooperative multiplayer game where the whole point is “does it actually synchronize across clients,” it’s worth every penny.</p>

<h2 id="golden-tests-on-linux-developed-on-macos">Golden tests on Linux, developed on macOS</h2>

<p>One gotcha worth sharing. Flutter’s golden tests are sensitive to platform font rendering. Goldens generated on macOS don’t match goldens generated on Linux, and GitHub Actions runs Linux. So CI is the source of truth. Locally, I keep the goldens marked <code class="language-plaintext highlighter-rouge">assume-unchanged</code> in git so they don’t pollute diffs. There’s an <code class="language-plaintext highlighter-rouge">/approve-goldens</code> comment workflow on PRs that regenerates goldens on Linux CI and commits them back to the branch.</p>

<p>This is dumb. It shouldn’t require this much infrastructure. But visual regression tests are non-negotiable for a game where the state <em>is</em> the screen, so the infrastructure got built.</p>

<h2 id="product-decisions-that-arent-in-the-rules">Product decisions that aren’t in the rules</h2>

<p>Things the rulebook doesn’t care about, but players do:</p>

<ul>
  <li>A tutorial overlay you can toggle on and off from the lobby, for new players.</li>
  <li>Sound and haptics, with a mute toggle for public play.</li>
  <li>Card-play animations and a confetti celebration when you win.</li>
  <li>Round-transition interstitials, so the game breathes between rounds.</li>
  <li>A “Play Again” flow that keeps the room intact, so you can rematch without resetting.</li>
  <li>A dark theme, because playing at night shouldn’t blind you.</li>
  <li>Reconnection that makes dropped connections invisible.</li>
</ul>

<p>None of these are engineering showpieces. They’re the difference between software that works and software people want to play.</p>

<h2 id="deployment">Deployment</h2>

<p>Server runs on Google Cloud Run. Cloud Build compiles the Dart server into a multi-stage Docker image and deploys it on push to main. The Flutter web client is built per PR and deployed to GitHub Pages with preview URLs, each pointing at the live Cloud Run server via <code class="language-plaintext highlighter-rouge">--dart-define=WS_URL</code>. Every PR is a playable link. You can open two browser windows on a PR preview and actually play the branch.</p>

<p>This is the part that made me take the project seriously. “Works on my machine” is not the same thing as “open this URL and play with a friend.” Closing that gap is half the job.</p>

<h2 id="ai-assisted-explicitly">AI-assisted, explicitly</h2>

<p>The co-authored-by tags in the commit history aren’t cosmetic. Roughly a fifth of the commits have Claude as a co-author. The <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> in the repo is a working contract: how to run tests, what the WebSocket protocol looks like, where the gotchas are. It’s written for an agent, not a human. Future-me and future-Claude both need it.</p>

<p>The pattern across the project looked like this:</p>

<ul>
  <li>I described the next feature, the boundary it should respect, and how it would be tested.</li>
  <li>Claude drafted the implementation, wrote the tests, and proposed a PR description.</li>
  <li>I reviewed, argued with the design where I disagreed, ran the tests, and merged.</li>
</ul>

<p>The result is 228 commits in 17 days on a non-trivial piece of networked software with a real test suite and real deployment. That’s not a velocity number I could hit alone. It’s also not a velocity number I’d trust without the test suite telling me the game still works at the end.</p>

<h2 id="whats-next">What’s next</h2>

<p>The game is playable. You can open it in a browser and actually play it. The next things on my list are shorter: tighter onboarding, a way to play solo against bots, and an eventual native iOS build. The architecture is built for all of that. The hard decisions are behind me.</p>

<p>That’s what a seventeen-day rewrite buys you. You land in the right shape, and the next six months of work are just additions.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Four browser windows arranged in a 2×2 grid, each logged in as a different player, silently passing cards to a Cloud Run server and watching the state snap back. No chat. No signaling. Just four Playwright-driven browsers and a hundred cards to get rid of in descending order. When that test passes, the game works.]]></summary></entry><entry><title type="html">Why I Rewrote Countdown</title><link href="https://sandboxsinkhole.com/blog/2026/04/07/why-i-rewrote-countdown/" rel="alternate" type="text/html" title="Why I Rewrote Countdown" /><published>2026-04-07T00:00:00+00:00</published><updated>2026-04-07T00:00:00+00:00</updated><id>https://sandboxsinkhole.com/blog/2026/04/07/why-i-rewrote-countdown</id><content type="html" xml:base="https://sandboxsinkhole.com/blog/2026/04/07/why-i-rewrote-countdown/"><![CDATA[<p>Joel Spolsky has a famous essay called “Things You Should Never Do, Part I” where he argues that rewriting software from scratch is the single worst strategic mistake a company can make. You throw away years of bug fixes. You underestimate the complexity of the parts you don’t remember. You deliver nothing for a year.</p>

<p>In March 2026 I threw away my cooperative card game and rewrote it from scratch. Seventeen days later I had something I was proud of.</p>

<p>This is the story of why that worked — and why it works less often than you think.</p>

<h2 id="what-v1-was">What v1 was</h2>

<p>The first Countdown was a Flutter-only implementation of a cooperative card game in the vein of <em>The Mind</em>. Players try to play cards in descending order, silently, without signaling. It was playable. It had the game loop. It had a UI. It also had problems I didn’t know how to fix without pulling everything apart.</p>

<p>The biggest one was that game logic lived inside the Flutter app. That meant:</p>

<ul>
  <li>The server was an afterthought, or there wasn’t one.</li>
  <li>Porting to other platforms meant re-implementing rules in parallel.</li>
  <li>The “source of truth” for what the game <em>was</em> lived inside a widget tree.</li>
</ul>

<p>You can ship a game this way. You can’t evolve one this way. Every change touches everything, because nothing is separated from anything.</p>

<h2 id="the-spolsky-test">The Spolsky test</h2>

<p>Spolsky’s argument applies most strongly to software that’s been in production for years, has thousands of edge-case fixes invisible in the code, and serves real users. Countdown v1 was zero-for-three. It had been in development for a few weeks. Its bug fixes were recent and recoverable. Nobody was playing it.</p>

<p>The other half of the Spolsky rule is underestimating the rewrite. A rewrite is cheap when:</p>

<ol>
  <li>The scope is small enough that you can hold the whole thing in your head.</li>
  <li>The lessons from v1 are clear, and they’re architectural rather than cosmetic.</li>
  <li>You have a working v1 to reference for the tricky parts.</li>
  <li>You have the velocity to do it in days, not months.</li>
</ol>

<p>Countdown passed four-for-four on rewrite economics. So I rewrote it.</p>

<h2 id="the-plan-three-phases-each-testable">The plan: three phases, each testable</h2>

<p>The hardest thing about a rewrite isn’t the code. It’s sequencing. You can’t ship a pure game engine — nobody plays a pure game engine. You can’t ship just a server either. The thing that’s playable is the full stack, which means the full stack has to come up together.</p>

<p>My answer was to build in three phases, each independently testable:</p>

<p><strong>Phase 1: Pure Dart game engine.</strong> No Flutter, no sockets, no I/O. Just the rules of the game expressed as deterministic functions on immutable state. A console bot simulator to exercise it. A baseline suite of unit tests covering deck, hand, player, engine, play results. If the engine is wrong, everything downstream is wrong. So the engine went first.</p>

<p><strong>Phase 2: WebSocket server wrapping the engine.</strong> The server is a thin wrapper over the engine, adding rooms, player ↔ engine ID mapping, message routing, and broadcast logic. More server-side tests on top of a fake sink that records what was sent. A room test doesn’t need a real WebSocket — it needs a boundary you can drive.</p>

<p><strong>Phase 3: Flutter client.</strong> A dumb client that renders server state and sends player intents. No game logic on the client. Widget tests on top. The client gets its state from the server, not from its own reducers.</p>

<p>The point of this phasing was independence. Each phase had a coherent test suite, a clear contract, and could evolve separately. It also meant I was never more than a day away from something that worked end-to-end, because phase 1 is a playable game with bots — via the console — before phase 2 is even written.</p>

<h2 id="the-architectural-insight-that-made-it-worth-doing">The architectural insight that made it worth doing</h2>

<p>The single most important decision was putting the game engine in its own package — <code class="language-plaintext highlighter-rouge">countdown_core</code> — and importing it as a path dependency from both the server and the client. The server imports it for logic and types. The client imports it for types only. The rules exist once, in pure Dart, deterministic and testable in isolation.</p>

<p>This is the lesson v1 taught me. The game is the engine. The transport is a detail. The UI is a projection. When those three things live in the same codebase undifferentiated, the game is nowhere and the code is everywhere.</p>

<p>In v2, if I want to port to a native iOS SwiftUI client, the engine comes with me. If I want to build an AI that plays the game, it runs against the engine directly. If I want to write a deterministic replay test, I feed the engine a sequence of moves and check the output. I can do things in v2 that were architecturally impossible in v1 — and I didn’t have to invent any of it. I just had to start from the right place.</p>

<h2 id="seventeen-days">Seventeen days</h2>

<p>The velocity number is more interesting than I expected. Averaging a dozen-plus commits a day for seventeen days is not a pace I could have maintained in 2022. It’s the product of three things:</p>

<ol>
  <li><strong>Scope discipline.</strong> I knew what I was building. The rewrite wasn’t a “let me reconsider the design” project. It was “execute the design I already learned from v1.”</li>
  <li><strong>Test-first structure.</strong> Every phase shipped with its tests. When I hit a confusing bug two weeks in, the test suite told me where to look. I was never spelunking.</li>
  <li><strong>AI collaboration at velocity.</strong> Roughly a fifth of the commits are co-authored with Claude. The split was: I wrote the architecture and most of the tests; Claude wrote a lot of the implementation. I spent my energy on structural thinking, Claude spent its cycles on typing, and the project didn’t stall in the “I know what I want but I don’t feel like writing it” valley where my side projects usually die.</li>
</ol>

<p>On that last point: the Spolsky essay is from 2000. He didn’t have a tireless pair programmer who would cheerfully write the boring half of a rewrite. The economics of “throwing away code” shift when the labor of writing replacement code is structurally cheaper. I don’t think Spolsky is wrong. I think the threshold at which rewriting is reasonable has moved, and moved a lot.</p>

<h2 id="when-to-rewrite-when-to-retrofit">When to rewrite, when to retrofit</h2>

<p>I’m not writing this to sell rewrites. I’m writing it because I had to make the call and I want to show my work.</p>

<p>Here’s my rule of thumb after v2:</p>

<p><strong>Rewrite when</strong> the scope fits in your head, the lessons are architectural rather than cosmetic, you have a working v1 to reference, and your velocity is high enough that the rewrite lands in days or weeks. Bonus: you have one design decision that a rewrite unlocks and no retrofit can reach.</p>

<p><strong>Retrofit when</strong> users are on the old version, the codebase carries non-obvious invariants, the lessons from v1 are “we’d do this differently” rather than “this design is wrong,” or the estimated rewrite is measured in months. Most of the time, this is the right call. That’s why Spolsky wrote the essay.</p>

<p>Countdown v2 passed the rewrite test. Most projects don’t. The mistake isn’t to rewrite when it’s right. It’s to call it right when it isn’t. Be honest about which one you’re doing.</p>

<h2 id="a-word-of-caution">A word of caution</h2>

<p>Nothing about this generalizes if the project is larger, has users, or has hidden complexity. The Spolsky rule is the default for a reason. I’m writing a card game, not replacing a payment system. If you’re thinking about rewriting a production system, the bar is much, much higher than “I have cleaner ideas now.”</p>

<p>But if you’re thinking about rewriting a side project that’s been sitting in the same shape for a year because the shape is wrong, and you can see the right shape, and it fits in your head: go.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Joel Spolsky has a famous essay called “Things You Should Never Do, Part I” where he argues that rewriting software from scratch is the single worst strategic mistake a company can make. You throw away years of bug fixes. You underestimate the complexity of the parts you don’t remember. You deliver nothing for a year.]]></summary></entry><entry><title type="html">Welcome to Sandbox Sinkhole</title><link href="https://sandboxsinkhole.com/blog/2026/04/06/welcome-to-sandbox-sinkhole/" rel="alternate" type="text/html" title="Welcome to Sandbox Sinkhole" /><published>2026-04-06T00:00:00+00:00</published><updated>2026-04-06T00:00:00+00:00</updated><id>https://sandboxsinkhole.com/blog/2026/04/06/welcome-to-sandbox-sinkhole</id><content type="html" xml:base="https://sandboxsinkhole.com/blog/2026/04/06/welcome-to-sandbox-sinkhole/"><![CDATA[<p>This is the first post on Sandbox Sinkhole. I’ll be using this blog to share the apps and games I’m building, along with links so you can try them out.</p>

<p>Each post will cover what the app does, how it works, and what I learned along the way. Stay tuned for the first real project post coming soon.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[This is the first post on Sandbox Sinkhole. I’ll be using this blog to share the apps and games I’m building, along with links so you can try them out.]]></summary></entry></feed>