Using LLMs to Write PRDs: Honest Review

I've been using LLMs to help write product requirements documents for about a year. Not to replace the thinking - the thinking is still mine - but to help with the writing. Here's an honest assessment of where they actually help and where they fail in ways that aren't obvious until you've been burned.

What works, and why

The best use I've found is turning rough notes into structured drafts. After a discovery session I'll have a page of bullet points - observations, half-formed hypotheses, things users said in support tickets - and I'll ask the LLM to help me turn that into a coherent problem statement. It's genuinely good at this. It can take messy input and produce clean prose, and the clean prose is usually close enough to what I want that editing it is faster than writing from scratch.

Edge case generation is also useful. When I'm writing acceptance criteria, I'll describe the feature and ask what edge cases I'm missing. For something like "user can update their payment method," a good LLM will surface things like: what happens if there's a pending transaction on the card being removed? What if the new card fails verification? What if the user has a subscription that's mid-cycle? These aren't always relevant to my specific context, but they're often enough to catch something I overlooked. It's a useful forcing function.

The third thing that works is tone editing. I write in a direct style that's fine for internal docs but sometimes too blunt for stakeholder-facing materials. "Make this more diplomatic without losing the substance" is a prompt that works surprisingly well.

Where LLMs fail for PRD work

Here's what I've learned about the failure modes, and they're more specific than "it doesn't know your users."

LLMs have a knowledge cutoff and no access to your production data. When I'm writing a PRD for a feature that touches our order management system, the LLM doesn't know that our orders table has a specific schema, that we use eventual consistency between our order service and our inventory service, or that our payment provider has a quirk where refunds take 3-5 business days to reflect in our system. It will generate plausible-sounding acceptance criteria that ignore all of this. The criteria will look complete. They won't be.

I've seen PMs use LLM-generated acceptance criteria as a substitute for actually talking to engineers about technical constraints. The criteria sound reasonable. They're often wrong in ways that only become visible during implementation, when an engineer has to come back and say "this assumes we can do X synchronously, but X goes through an async job queue and can take up to 30 seconds." That's a PRD problem, not an engineering problem.

⚠️

LLM-generated acceptance criteria look complete. They're often wrong in ways that only surface during implementation. The LLM doesn't know your system. You do. Read the output against what you actually know about your architecture.

The other failure mode is more subtle. LLMs are trained to produce confident, fluent text. Confident, fluent text feels authoritative. I've caught myself accepting LLM-generated language that was technically correct but missed the nuance of what I was trying to say - and I only caught it because I read it carefully. If you're using an LLM to speed up your writing and you're not reading the output critically, you're going to ship specs with subtle errors that look fine on the surface.

The temperature problem is real too. LLMs by default produce balanced, hedged answers. "Should we build X or Y?" gets you a response that thoughtfully considers both sides. That's useless when you need to make a call. The strategic judgment has to come from you, and the LLM's tendency toward balance can actually make it harder to think clearly about tradeoffs.

The workflow I've settled on

I write the first draft myself. Always. The first draft is where I figure out what I actually think, and outsourcing that to an LLM means I never really understand my own spec. I've tried skipping this step and the resulting documents are technically coherent but feel hollow - they don't reflect any real thinking about the problem.

Then I use the LLM for editing passes. "Make this clearer." "Is there anything ambiguous here?" "What questions would an engineer have after reading this?" Those are useful prompts that improve the document without replacing my thinking.

For edge cases, I do the exercise after I've written the acceptance criteria, not before. I want my own thinking first, then the LLM's suggestions as a check. If I do it before, I anchor on the LLM's suggestions and stop thinking independently.

One thing I've started doing: after the LLM suggests edge cases, I filter them against what I know about our actual system. "What happens if the network request times out?" is a valid edge case. Whether it's relevant depends on whether the operation is synchronous or goes through a queue with retry logic. The LLM doesn't know which one we're using. I do.

The honest verdict

LLMs make me faster at writing PRDs. They don't make me better at product thinking. Those are different things, and it's easy to confuse them because the output looks the same - a well-written document. A well-written document with bad thinking is still a bad PRD. The LLM can polish the prose. It can't fix the reasoning, and it can't substitute for knowing your system, your users, and your technical constraints.

If you're using LLMs to avoid the hard thinking, you're going to ship the wrong things faster. That's not an improvement.