The Difference Between Output and Outcome

We shipped 47 features last year. I know this because someone put it in a slide deck for the all-hands. Forty-seven features. The number was presented as evidence of productivity.

Nobody mentioned whether any of them moved the metrics that matter.

This is the output vs outcome problem, and it's everywhere in product development. Teams measure what they ship - features, tickets closed, story points completed - because those things are easy to count. They measure outcomes - user behavior changed, revenue increased, problem solved - less often, because those things are harder to attribute and slower to appear.

Why output is seductive

Output is immediate and visible. You shipped a feature. You can see it in the product. You can demo it. You can put it in the release notes. The team feels good. The stakeholders feel good. Something happened.

Outcome is delayed and ambiguous. Did the feature actually help users? Did it change behavior in the way you hoped? Did it contribute to the metric you care about, or did the metric move for some other reason? These questions take weeks or months to answer, and the answers are often "we're not sure."

There's also a technical reason teams optimize for output: it's much easier to instrument. You can count deploys, count tickets, count story points. Measuring whether a feature actually changed user behavior requires setting up the right events before you ship, defining what "success" looks like in advance, and then waiting long enough for the data to be meaningful. Most teams don't do this consistently, so they fall back to counting what they can count.

Output mindset

We shipped 47 features. Roadmap is on track. Team velocity is up.

Outcome mindset

3 features moved our core retention metric. 2 reduced support volume by 18%. The other 42 are in maintenance mode.

At Uzum's scale, this problem compounds. When you're processing millions of orders, a feature that looks successful in aggregate can be masking a failure for a specific user segment. A new checkout flow might increase overall conversion by 2% while making things worse for users on slower connections - and if you're only looking at the aggregate number, you'll never see it. Output metrics don't show you this. Outcome metrics, properly defined, do.

The feature factory problem

The worst version of this is a team that ships features continuously without ever asking whether the features are working. The roadmap is a list of things to build. You build them. You move to the next thing. The backlog never gets shorter because stakeholders keep adding to it.

I've worked in a feature factory. It feels productive from the inside. You're always busy, always shipping, always in planning for the next thing. But when you step back and ask "are users better off than they were six months ago?" the answer is often "we don't know" or, worse, "not really."

The technical debt version of this is even more insidious. Every feature you ship without a clear outcome hypothesis is a feature you might need to maintain forever. Database tables get created, indexes get added, API endpoints get exposed. The codebase grows. The system gets more complex. And if the feature didn't actually solve the problem it was supposed to solve, you've added complexity without adding value. You can't easily remove it because you don't know what depends on it. So it stays, and the next feature gets built on top of it, and the system gets harder to change.

What outcome-first looks like in practice

For every significant feature I work on now, I write down the outcome hypothesis before I write the spec. "We believe that if we add real-time order status updates via push notification, users who have push enabled will have a 30% lower rate of contacting support about order status." That's the bet. The feature is how we're testing it.

This forces two things. First, it forces me to define what success looks like before I've built anything, which means I can't retroactively declare success based on whatever the data shows. Second, it forces me to think about whether the feature is actually the right solution to the problem. If the outcome I care about is "fewer support contacts about order status," maybe the right solution isn't push notifications - maybe it's making the order status page easier to find, or improving the accuracy of our delivery estimates so users aren't anxious in the first place.

💡

Write the outcome hypothesis before you write the spec. "We believe X will cause Y" forces you to define success before you've built anything, so you can't retroactively declare victory.

After we ship, we check. Did the outcome happen? If yes, great - we learned something and we can build on it. If no, we need to understand why. Did we build the wrong thing? Did we build the right thing badly? Did our hypothesis about user behavior turn out to be wrong? The answers to those questions are more valuable than the feature itself. A feature that fails but teaches you something is more valuable than a feature that ships and gets ignored.

The honest version

I don't always do this perfectly. There are features I've shipped without a clear outcome hypothesis, usually because the timeline was tight or the stakeholder pressure was high. Those features end up in the "we shipped it, we don't know if it helped" category. They're also the features that are hardest to prioritize for improvement later, because you don't have a baseline to compare against.

The goal isn't perfection. The goal is to make outcome thinking the default, not the exception. To ask "what are we trying to change in user behavior?" before "what are we going to build?" More often than not, that question leads to better features and fewer of them.

Forty-seven features is a lot of output. I'd rather have ten features that moved the needle and the data to prove it.