Writing Specs Nobody Ignores

We had an engineer spend a week building a batch job that ran nightly when what we actually needed was a real-time webhook handler. The spec said "sync order status" - I meant immediately on status change, he reasonably read it as "keep them in sync." A nightly batch job does keep them in sync. It also means a customer's order shows as "processing" for 23 hours after it shipped. That's a support ticket, not a feature.

The spec wasn't ambiguous in the way I thought specs were ambiguous - missing requirements, unclear scope. It was ambiguous about timing semantics. I wrote "sync" without specifying whether I meant eventual consistency or near-real-time. Those are completely different system designs. One is a cron job. The other is an event-driven architecture with a message queue. I didn't know I needed to specify that until I saw what got built.

That's the thing about specs. The gaps you don't know about are more dangerous than the gaps you do.

What the spec said

Sync order status from the logistics provider

What the spec needed to say

Update order status within 30 seconds of the logistics provider webhook firing. Use a webhook receiver, not polling. Status must be visible to the customer before they can reasonably call support.

What engineers actually need from a spec

The typical PM spec has a problem statement, requirements, wireframes, and acceptance criteria. It's structured like a legal document. Engineers read the parts relevant to their work and ignore the rest.

The problem is that the most important parts of a spec aren't the requirements - they're the reasoning and the constraints. Why does this feature exist? What's the expected load? What are the latency requirements? What happens when a dependency fails? Without that context, engineers make reasonable assumptions that turn out to be wrong, and they make them in week three when you're deep into implementation and changing course is expensive.

I started adding a "technical context" section to every spec. Not a system design - I'm not the one designing the system. But a set of constraints and expectations that affect the design: "This endpoint will be called on every page load for logged-in users, so it needs to be fast - under 100ms p99." Or: "The payment provider sends webhooks for status changes, so we don't need to poll." Or: "This data is user-specific and can't be cached at the CDN layer." These aren't implementation decisions. They're product requirements that have implementation implications, and they belong in the spec.

The questions that slow down sprints

Before I write a spec, I try to figure out what questions the team will have in the first planning meeting. Not what questions I think they should have - what questions they'll actually ask.

Usually it's things like: What happens when the user does X in an unexpected order? What's the fallback if the third-party service is down? Do we need to handle this edge case or is it out of scope for this version? These are the questions that turn a two-hour planning meeting into a four-hour one, and they're almost always answerable if you've thought about the feature carefully enough to write a good spec.

So I answer them upfront. Not all of them - I can't predict everything - but the obvious ones. And I make the out-of-scope decisions explicit. "We are not handling partial refunds in this version" is one of the most useful sentences you can write in a spec. It prevents scope creep, it sets expectations, and it gives engineers permission to not solve problems you haven't asked them to solve. Without that sentence, an engineer who hits the partial refund case has to make a call: implement it anyway, or stop and ask. Either option is worse than having the answer in the spec.

💡

"We are not handling partial refunds in this version" is one of the most useful sentences you can write in a spec. Explicit out-of-scope decisions prevent scope creep and give engineers permission to move on.

The timing and state machine problem

The class of spec ambiguity I've learned to watch for most carefully is anything involving state transitions and timing. "The order status updates when the seller confirms" - when exactly? Synchronously in the same API call? Asynchronously after the seller's system sends a callback? What if the callback never comes? What's the timeout? What's the retry behavior?

These questions feel like implementation details, but they're not. They're product decisions. If the status update is synchronous, the buyer sees it immediately but the seller's confirmation step has to be fast and reliable. If it's asynchronous, you need to handle the intermediate state where the seller has confirmed but the buyer's view hasn't updated yet. That intermediate state is a user experience decision, not just a technical one.

I've started drawing state diagrams for any feature that involves status changes. Not fancy UML - just boxes and arrows. "Order created -> Payment pending -> Payment confirmed -> Seller notified -> Seller confirmed -> Shipped." Every arrow is a transition. Every transition has a trigger, a timing expectation, and a failure mode. Writing that out forces me to think through the cases I'd otherwise leave implicit.

The real test

The real test of a spec is what happens in the first sprint planning meeting. If the team spends most of the time clarifying requirements, the spec failed. If the meeting is mostly about sequencing and estimation, the spec worked.

I've started asking engineers directly after planning: "Was the spec useful? What was missing?" The feedback is usually specific. "You didn't say what happens if the user closes the browser during checkout." "You didn't specify whether the search needs to be real-time or if a few seconds of lag is acceptable." Those are things I can fix in the next spec.

Nobody writes a perfect spec. But you can write one that's good enough that engineers can make good decisions when they hit edge cases - which they always do - without having to stop and find you. That's the goal. Not documentation. Decision-enabling.