Building in Public: What I Learned Shipping Fast

"Ship fast" is advice that sounds simple until you try to actually do it at scale. At Uzum, we're processing millions of orders. A bad deploy doesn't affect a hundred users - it can affect hundreds of thousands of active sessions simultaneously. Shipping fast in that environment isn't about being reckless. It's about building the infrastructure that makes fast shipping safe.

Most teams that say they ship fast don't. They ship infrequently and call it fast because they move quickly within a sprint. Real fast shipping means you can go from merged PR to production in under an hour, you can roll back in under five minutes without a full redeploy, and you know within 15 minutes whether something you shipped is causing a regression. That's a different thing entirely.

What fast shipping actually requires

The first thing you need is feature flags. Not just for big features - for almost everything. A feature flag means you can deploy code to production without activating it for users. You can activate it for 1% of traffic, watch your error rates and latency metrics, and expand to 100% if everything looks clean. Or you can flip it off instantly if something goes wrong, without touching the codebase.

💡

Feature flags change the blast radius of a bad deploy from "your entire user base" to "whatever percentage of traffic you've enabled." That changes the risk calculus completely.

Without feature flags, every deploy is a binary event. It's either on or off for everyone. That forces you to be conservative about what you ship and when, because the blast radius of a bad deploy is your entire user base. With feature flags, the blast radius of a bad deploy is whatever percentage of traffic you've enabled. That changes the risk calculus completely.

The second thing you need is observability that's actually useful. I've worked with monitoring setups that technically tracked everything but were so noisy that nobody looked at them. Useful observability means you have dashboards that show you the metrics that matter - error rates, p95 latency, conversion rates on key flows - and those dashboards update in near real-time. It means you have alerts that fire when something actually breaks, not alerts that fire constantly because the thresholds are wrong.

At Uzum, when we ship something that touches the checkout flow, I'm watching three things: payment success rate, order creation latency, and the error rate on our order service. If any of those move more than a few percentage points in the wrong direction within 10 minutes of a deploy, we roll back. That's the protocol. It's not a judgment call in the moment - it's a pre-agreed threshold that removes the pressure to rationalize a bad deploy.

The third thing is the ability to roll back without a full redeploy. This sounds obvious but it's not trivial. If your rollback procedure is "revert the commit, push, wait for CI, wait for deploy," you're looking at 20-40 minutes minimum. That's 20-40 minutes of a broken experience for your users. If your rollback is "flip the feature flag off," it's 30 seconds.

The culture problem is real

Infrastructure is the easier part. The harder part is the culture.

Engineers who care about their work - which is most good engineers - find it uncomfortable to ship something they know isn't finished. I've had engineers push back on shipping a feature because the error handling wasn't complete, or because they wanted to add one more edge case. Sometimes they're right. Often they're optimizing for a scenario that will never happen while delaying feedback on the scenario that will.

The framing I've found useful is: "we're not shipping a finished product, we're running an experiment." An experiment doesn't have to be perfect. It has to be instrumented well enough that you can learn from it. If the feature is behind a flag, affects 5% of users, and you're watching the metrics, you're running an experiment. You'll know within 48 hours whether your assumptions were right.

The teams I've seen struggle with this are the ones where "we'll fix it next sprint" is treated as a failure rather than a strategy. If you ship something imperfect and then fix it based on real user behavior, that's not a failure. That's the process working correctly.

What I've learned about stakeholder management

Shipping fast creates a specific stakeholder problem: people see imperfect things and conclude the team doesn't know what they're doing. I've had this happen. We shipped a feature with a UI that was clearly a first pass - functional but rough - and a senior stakeholder saw it and sent a message asking if we were sure we were ready to launch.

The answer to this is to set expectations before you ship, not after. "We're shipping a v1 of this feature to 10% of users to validate the core flow. The UI will be rough. We'll iterate based on what we learn." If you say that before the feature goes out, the rough UI is expected. If you say it after, it looks like an excuse.

The teams that do this well treat every ship as a communication event, not just a technical event. The deploy is the easy part. Making sure everyone who needs to know understands what's going out, why it's going out now, and what you're watching for - that's the work.