Article 4 of 6

Platform Engineering: When to Build, What to Build, and for Whom

The principles behind internal platforms that actually get adopted vs. platforms that become expensive mandates.

12 minAdvanced

✦

Key Takeaway

Platform engineering is the practice of building and maintaining internal products that reduce the cognitive load and operational overhead for product teams. Done well, a platform team makes the right way to deploy, monitor, and operate services so easy that product teams choose it voluntarily — because it's genuinely better than the alternative. Done poorly, it becomes an expensive mandate that product teams route around. The difference lies almost entirely in whether the platform team thinks of itself as running an internal product or running a shared infrastructure service.

Let me start with what platform engineering is not, because the confusion about its identity is responsible for most of the failed platform initiatives I've seen.

It is not a DevOps team with a new name. DevOps is a philosophy about breaking down walls between development and operations, about continuous delivery, about shared responsibility for production systems. Platform engineering builds infrastructure that enables that philosophy at scale. These are related but distinct.

It is not an infrastructure team that now writes Terraform instead of scripts. Infrastructure teams manage the underlying cloud resources — compute, networking, storage. A platform team builds the developer-facing layer above that: the golden path for deploying a service, the observability toolkit that product engineers actually use, the developer portal that surfaces documentation and service catalog information.

It is not a shared services team that other teams file tickets to. That model — where product teams open tickets to provision a database, request a SSL certificate, or enable a feature flag — is a coordination bottleneck that slows delivery. The platform team that works like a shared services desk is a platform team that hasn't learned to build products.

Platform engineering is the practice of building internal products for software engineers, where the customers are your own product teams and the product's success is measured by whether those customers voluntarily use it to move faster and with less cognitive overhead than they could without it.

Why This Is Different From What Most Teams Do

When I talk about platform engineering with heads of engineering at Indian product companies, the conversation often starts with some version of: "We already have a team that manages our cloud infrastructure and CI/CD. Is that what you're talking about?"

Usually, the answer is "that team does some of what a platform team does, but the way they work isn't the platform engineering model."

The difference is in the relationship with the internal customer. An infrastructure team says: "We manage the infrastructure. Other teams use it." A platform team says: "We build an internal product. Our product teams are our customers. We're responsible for their productivity, not just the availability of the resources they use."

This reframing has concrete implications for how the team works:

User research. A platform team runs user interviews with product engineers. They measure how long it takes a new engineer to deploy their first service, because that's a product metric. They run quarterly satisfaction surveys across the engineering organization. They have a feedback channel. This feels foreign to most infrastructure engineers, but it's exactly what a product team does for external customers.

Roadmaps. A platform team publishes a roadmap — what capabilities they're building, in what order, and why. Product teams can see what's coming and plan around it. They can influence priorities through a structured process. This creates the predictability and transparency that makes product teams trust the platform.

APIs and documentation. Internal tools need documentation just as much as external APIs. A deployment system that requires three days of shadowing a senior engineer before you can use it is not a well-designed internal product. The platform should be documented well enough that an engineer new to the organization can self-serve.

SLAs. If the CI/CD system is down, product teams can't deploy. That's as serious as external customer impact. Platform teams should have internal SLAs for their systems and communicate transparently when they're violated.

The Golden Path

The concept of the "golden path" is central to how good platform engineering works, and I want to spend time here because I've seen it misunderstood in ways that lead to platforms that nobody uses.

A golden path is an opinionated, well-supported, well-documented way to accomplish a common engineering task. It's the way the platform team recommends deploying a new service, setting up observability, provisioning a database, running integration tests in a pre-production environment.

The critical word is "opinionated." A golden path is not a menu of options with instructions for each. It is one way, chosen deliberately because it's the best trade-off of simplicity, security, and operational soundness for the majority of use cases. Spotify's paved roads, Netflix's Paved Road — these are examples of the same concept.

The golden path works because of the bargain it represents: the platform team commits to making this path so easy to walk that choosing a different path is only worth doing when you have genuinely unusual requirements. The path is paved because the platform team does the infrastructure work, writes the documentation, maintains the templates, and provides support. Product teams get productivity in exchange for following the conventions.

The failure mode is a platform team that builds a "flexible" system that supports every possible configuration. The documentation is a reference manual, not a quickstart guide. New engineers spend two days choosing between options instead of thirty minutes following a clear example. Nobody knows which configuration is actually recommended. The platform team interprets this as "we're meeting everyone's needs" while product engineers interpret it as "the platform doesn't help me."

An example of a concrete golden path, from a company I worked with:

Deploying a new backend service:

platform init-service --name my-service — scaffolds the service with the standard structure, Dockerfile, Helm chart, CI/CD configuration, logging configuration, and observability setup pre-wired
git push — the CI/CD pipeline runs tests, builds the container, pushes to the registry, deploys to the staging environment
platform promote --env production — promotes the staging image to production with rollout monitoring

Three steps. No manual Dockerfile editing, no Kubernetes YAML writing, no configuring Prometheus scrape targets, no setting up log aggregation. All of that is pre-configured on the golden path.

Engineers who have unusual requirements — a service that needs custom network policies, a batch job instead of a long-running service, a different deployment cadence — can deviate. The platform doesn't prevent it. It just doesn't support the deviation as actively.

The Thinnest Viable Platform

One of the most common mistakes I see platform teams make is over-building. They see the opportunity to create a comprehensive internal developer platform — a complete developer portal with a service catalog, a secret management system, a feature flag system, a custom observability stack, an internal deployment DSL, an environment management system — and they try to build all of it.

The principle that guards against this is the Thinnest Viable Platform: build only what genuinely reduces friction for product teams, and build it well rather than building everything adequately.

Ask for each capability you're considering building: "What is the current pain, and how many teams feel it?" If the answer is "all product teams spend three hours per new service setting up observability by hand," that's a platform investment with clear, measurable ROI. If the answer is "the infrastructure team finds it annoying to manage SSL certificates manually," that might be the right automation for the infrastructure team, but it's not a platform investment.

The thinnest viable platform also means recognizing what you shouldn't build. For most organizations, the right answer for secret management is Vault or AWS Secrets Manager or a similar commercial product, not a custom implementation. The right answer for feature flags is LaunchDarkly or Unleash, not a home-grown system. The platform team's job is to integrate these tools into the golden path — to make them easy for product teams to use — not to replicate their functionality.

This matters more than it sounds. Every internal tool a platform team builds is a tool they commit to maintaining indefinitely. Internal tools accumulate technical debt, require documentation, break when their dependencies change, and create on-call burden. Each one has a carrying cost. The thinnest viable platform minimizes that carrying cost by building selectively.

When to Start a Platform Team

This is the question I'm most frequently asked, and my answer is always specific: you need a platform team when the absence of one is measurably slowing down multiple product teams.

The organizational signals that indicate it's time:

Duplicate infrastructure work across teams. When three product teams each have a team member who spends 20% of their time maintaining that team's deployment pipeline, and all three pipelines look roughly similar, you have 60% of a platform engineer's time scattered across product teams with no economies of scale and no shared knowledge. A platform engineer could build a shared pipeline that all three teams use, freeing up three partial engineers to do product work.

Slow onboarding. When new engineers take more than two weeks to ship their first pull request to production, part of the problem is likely that the path to production is unclear or poorly documented. A platform team with a golden path and good documentation can cut onboarding time dramatically.

Inconsistent security and operational practices. When each team makes its own decisions about how to handle secrets, how to configure logging, how to set up alerting, you will eventually have a security incident or operational failure caused by a team that made a bad decision without realizing it. A platform team that establishes secure defaults makes the secure choice the easy choice.

Infrastructure operational overhead consuming product team time. When product engineers are spending more than 10-15% of their time on infrastructure work — provisioning, maintaining, debugging — they're not doing product engineering. That's the clearest signal that the engineering organization would benefit from someone focused on making infrastructure work self-service.

Most organizations don't need a dedicated platform team until they have 30-50 product engineers, though there are exceptions in either direction. At fewer than 30 engineers, a part-time investment in shared infrastructure and documentation often serves the same purpose. At more than 50, the absence of a platform team is usually showing up as real delivery friction.

What a Platform Team Should Measure

If you run a platform team and you're not measuring adoption, you don't know if your platform is working.

The metrics I care about:

Golden path adoption rate. Of the services deployed in the last quarter, what percentage used the golden path? If the number is below 70% and falling, the golden path isn't good enough — either it's too rigid, too poorly documented, or not solving the right problem.

Time to first deployment for a new service. How long does it take a competent engineer to take a new service from initialized repository to first production deployment? This is the most direct measure of the platform's effect on productivity. Baseline it, then measure whether your platform investments move it.

Infrastructure-related incident rate. What fraction of production incidents are caused by infrastructure configuration issues — misconfigured logging, wrong environment variables, missing alerts, incorrect resource limits — that a well-designed platform would have prevented through safe defaults? Track this and watch whether it falls as platform maturity improves.

Engineer satisfaction score. A simple quarterly survey: "How satisfied are you with the internal tooling and platform capabilities available to you?" Measured on a 1-5 or 0-10 scale, tracked over time. When this score is low and flat, the platform isn't delivering. When it's rising, you're building the right things.

The Internal Product Mindset in Practice

The platform team I've seen work best operated like this: every quarter, they ran thirty-minute user interviews with engineers from across three or four product teams. Not to show off new features — to understand current friction. Where are you spending time on things that shouldn't require your time? What did you try to do last month that took longer than expected? What would make you more productive this quarter?

The insights from those interviews drove the roadmap. Not the platform team's intuitions about what product engineers should want — actual feedback from the engineers they were serving.

They also kept a public backlog and a public roadmap on their internal wiki. Product teams could see what was planned, add items, and vote on priorities. This created a sense of shared ownership that made product teams more willing to adopt platform capabilities and more forgiving of the occasional rough edge.

They ran a weekly office-hours session. Not for support tickets — for conversations. Engineers who were trying to do something non-standard with the platform would come and work through it with the platform engineers. These sessions produced the most valuable feedback, because they surfaced the places where the golden path didn't fit and forced the platform team to decide: should we extend the path, or document the deviation pattern?

Common Mistakes, Named

Building platforms that require mandatory adoption. The fastest way to destroy trust in a platform team is to mandate that all teams use the platform for something it isn't ready for. Engineers are smart. They will find the platform's limitations within two weeks, and if they're forced to use it anyway, their frustration will spread throughout the engineering organization. Platforms should earn adoption through quality, not mandate it through policy.

Over-engineering for flexibility at the cost of usability. A Kubernetes-based deployment platform that supports every possible configuration through a 200-line YAML file is not a usable platform. It's a trap for engineers who need to deploy a simple web service and now have to learn Kubernetes to do it. Platforms that optimize for the 95% case — even at the cost of some flexibility — are more valuable than platforms that try to support every case equally.

Not investing in documentation. Internal tools tend to be documented for the engineers who built them, not for the engineers who will use them. "Here's the repository, the README explains the basics" is not documentation. Documentation for a platform means: a quickstart that walks a new engineer through their first real use case from scratch, a reference for all available options, troubleshooting guides for common failures, and examples for the most common patterns. This is not optional overhead — it's a core feature of the product.

Staffing the platform team with junior engineers to free up seniors for product work. Platform engineering requires senior engineering judgment. The platform team makes decisions that affect every product team in the organization — deployment conventions, observability standards, infrastructure defaults. These decisions have long-lasting consequences. Staffing them with engineers who lack the experience to see those consequences is a mistake that costs you later, usually in the form of a platform that needs to be rebuilt as the organization grows.

Platform engineering done well is one of the highest-leverage investments an engineering organization can make as it scales. When product teams spend less time on infrastructure concerns, more time goes to product problems. When the right practices are the easy practices, security and operational quality improve without mandates. When onboarding is smooth, new engineers become productive faster. All of this compounds.

The key is to build it like a product: with users in mind, with clear measures of success, with the discipline to say no to scope that doesn't serve the customer. The customers are your own engineers. Serve them well.

Organizational Design: How Structure Shapes Engineering Velocity Incident Response at Scale: Building the Capability to Recover Fast