By Dan Teodosiu and Dave Kellogg

Introduction

As the software industry matures everyone is focused on profit margins. The desire to increase EBITDA leads to more pressure being put on every function for measurable performance improvements. The Rule of 40% is now turning into the Rule of 60%.

While this puts all functions under pressure, Engineering, the largest cost center in most startups, gets its pro rata share, and often more. As CTO, you quickly discover that everyone else at C-level is keen on increasing engineering productivity and is thus piling pressure on you. Thus, you’re constantly being asked by your Board, CEO, and CFO to “measure development productivity” and show meaningful improvement.

In most companies, there’s a belief that the company could be doing much better if only development were moving “faster.” If only we could ship more features per unit time. But wasn’t Agile supposed to deliver that? Did it? Wasn’t the Spotify model supposed to do that? Did it? These days, your CEO believes GenAI is the answer to your company’s woes and that it will allow you to move faster. Will it? And how do you actually ensure that’s the case?

If speed is the end game, then you need to be very clear about what speed you ought to measure and improve. Instead of focusing solely on Engineering speed, companies should focus on how quickly they can deliver outcomes, such as increased customer adoption or engagement. Keep reading for insights on:

  • Why traditional engineering measurement approaches don’t (fully) work.
  • Why focusing on outcomes and customer value metrics is a better method.
  • How to respond when leadership inevitably asks for metrics.

In the end, Engineering productivity is not how fast you write code. It’s how fast your company increases customer value.

Is faster always better?

In general, the perception is that Engineering needs to ship more features, faster. Engineering is the bottleneck that’s holding back the company.

However, “more” and “faster” are not necessarily what they seem to be, and oftentimes risk becoming “Dickensian” metrics.

For instance, the Microsoft Windows division tried looking at Lines of Code (LoC) per engineer in the early 2000s as a productivity metric. They quickly found out that some very senior engineers seemed to be particularly “lazy,” only producing a few LoC per month! When they looked at what these engineers were doing, they realized they were researching and fixing some of the most difficult bugs in the operating system kernel. Usually, fixes were only a few lines long, but replicating and understanding each bug took many weeks.

In fact, less code is usually better. To paraphrase Mark Twain, “I apologize for writing such a long program, I didn’t have time to write a short one.” Or, closer to home, to quote the Claude Code developers, “every time there’s a new model release, we delete a bunch of code.” 

Just as a line of code is not a line of code, a feature is not a feature — any given feature may or may not add value, may correspond to a harder vs. easier problem, may be shipped at higher vs. lower quality (introduce more bugs fast!), or may provide a specific vs. a generic solution. But, just as “some mistakes are too much fun to make only once,” (Margaret Mitchell) don’t expect companies to give up without a fight on measuring LoC (or its contemporary cousin, tokens).

In addition to writing code, engineers may work on things that are less visible but have high impact. To cite Gergely Orosz: “I remember this one week when one of my best engineers did no commits, disappeared from meetings, and did no code reviews. Later, I found out that he paired with two engineers on another team the whole week, helping them ship a critical feature that would [not have] been done in time, and the impact of getting it out was massive.

So if focusing on features shipped or LoC written is not the right thing to do, what should the engineering team focus on? In his 2022 article, Gergely Orosz argues that companies should focus on effectiveness, which is absolutely right – but how do you quantify effectiveness?

Are DORA and SPACE all you need?

Various engineering productivity metrics have been proposed over time. Two of the most commonly used frameworks are DORA and SPACE.

The DORA (DevOps Research and Assessment) team was started by three engineers in 2016 and was later acquired by Google. They advocated the introduction of metrics such as deployment frequency, mean lead time for changes, change failure rate, and mean time to restore.

DORA defines important hygiene metrics, but these metrics only provide a narrow DevOps view of how well development is doing, as they tell you almost nothing about the critical reasons people buy, use, succeed, or fail with your software or service.

SPACE was started in 2021 as a holistic framework (in part based on DORA) for measuring developer productivity. It is similar to DORA but introduces self-reported metrics such as “satisfaction and well being” that can help you tell whether your engineering team is in “hero mode” or has well-established practices.

But even with SPACE, the problem still remains: how do these metrics map to your company’s success? They’re still hygiene metrics that Engineering should track rigorously, but they don’t answer the fundamental question about outcomes. They tell you when something in Engineering is wrong (e.g., the team is not shipping frequently enough or not at the right level of quality), but they don’t necessarily tell you when things are right, i.e., you’re making progress towards delivering business outcomes.

But surely GenAI is all you need?

We’ve recently seen an inflection point in how GenAI can help developers produce code faster (see, for instance, here or here). While the jury’s still out on what the actual, measurable engineering productivity gains are (see here or here), there is mounting evidence that AI may bring about a significant improvement for engineers. For instance, many companies are now complaining about the fact that reviewing PRs (Pull Requests) by humans has become the main engineering bottleneck.

So is the problem solved? Not really. Imagine replacing your car’s 150 horsepower (HP) engine with a 2,000 HP one. Your car will certainly be much faster. But that could lead to you winning a race or simply driving off a cliff at high speed (or, for that matter, flipping over). ChatGPT can help you write text faster. Gamma can help you build slides faster. Claude can help you code faster. None of them do much to ensure you’re building the right thing – they also can make you dramatically more efficient at doing the wrong thing.

The question really lies with what you achieve when you wield the additional power brought by GenAI. We think it is a force multiplier for companies that understand what “faster” and “more” mean, but may become a distraction for companies that don’t. Hence, keep reading on!

OK, let’s calm down and take a step back

Assuming narrowly-defined “engineering productivity” is the only thing that matters distracts from the key questions your company must answer in order to be successful:

  • Are you providing great value for your customers?
  • Are you improving business outcomes fast enough?

Your customers don’t care about how many LoC, storypoints, or fancy features you ship – they care about the value your product or service delivers to them. Hence, you shouldn’t obsessively focus on doing what you have been doing, faster. Instead, you need to think creatively about novel ways in which you can add more customer value faster.

As an example, here’s the experience one of us had with supporting different advertising formats at Criteo, around 2014-2015. We needed to support tens of thousands of different banner ad formats and types. The development team in charge of this was moving at a good clip according to the DORA-like metrics we were tracking at the time (this was before DORA became a thing) – but they had to hand-code each new format individually. On average, they were able to ship only 3-4 formats per month.

However, the Sales team had a prioritized list of close to ten thousand formats that customers wanted – and no, Sales weren’t happy with the status quo! Assuming we had had GenAI at the time (alas, transformers were invented two years later), even a 10x coding speedup would have been insufficient to meet the need.

We solved the problem by bringing in a new team lead who changed the paradigm and developed a no-code solution based on a DSL (Domain Specific Language) and a constraint solver. The problem was pushed to the professional services team. Output jumped to 500-700 formats per month. Sales were happy. Customers were happy.

What changed? Instead of focusing on coding faster we focused on how we could add customer value faster by using a different approach.

Putting the right metrics in place

If you don’t have the right metrics in place, impact is difficult to measure, which leads many companies back to measuring things they can easily measure as opposed to things that are meaningful (for instance, see here). To quote Peter Drucker: “there is nothing quite so useless as doing with great efficiency, something that should not be done at all.

In a previous article, we argued that putting in place leading customer value metrics is critical to align your entire business around the key problems you need to solve.

One billion-dollar example we gave in that piece was related to a tale of two search engines: the company that focused on the right customer value metrics (relevance and speed) now has 90% market share, while the company that focused on “differentiating features” has eked out a measly 5%, even though it had some of the best resources in the world at its disposal.

While your CEO cares about business outcomes (e.g., last quarter’s new bookings or deal win rate), these are typically measured using trailing metrics that are hard to optimize directly, as they take a long time to change and causality is often difficult to determine.

Under the assumption that increasing customer value eventually leads to improved business outcomes, adopting leading customer value metrics helps you put in place a framework in which speed becomes quantifiable in a way that’s relevant to your customers and your business.

If you define “speed” as how fast and “more” as by how much you can improve your customer value metrics, a few things become evident:

  • It’s not how many features you ship, but what impact these features have on the customer.
  • Your whole company needs to be committed to measuring customer outcomes, improving customer value, and implementing an effective exploratory process for doing so faster.

This, of course, requires that the company develops clarity on what those outcomes are.  Picking speed and relevance was empirically non-obvious to many search engine providers at the time.

How do you become the one company that gets it right? By thinking deeply about customers and the value they derive from the product — while resisting the many default answers offered by well-intentioned but non-strategic voices. In short: rigorously develop and test a clear customer value hypothesis, then stay focused on it.

Which means that everyone needs to accept, at some level, that if we have clarity on what matters and we focus on the things that matter, then, to quote a famous San Francisco sports coach, “the score takes care of itself.” Or, to quote Replit’s CEO Amjad Masad, “Internally, we don’t set ARR metrics.” That is a leap of faith that many organizations are unwilling to take, preferring the security of availability bias – tracking what’s measurable as opposed to what matters.

As a corollary, Engineering productivity is not just an Engineering problem, but encompasses Product as well, and often extends to other functions such as GTM. Engineering and Product are two sides of the same coin, inseparable when it comes to productivity.

Referring to our previous example, Engineering can supply the new 2,000 HP engine for the car, but Product is the driver that ensures the car stays on the racetrack instead of flipping over on Sand Hill Road.

One needs to keep in mind that the Product space is exponentially complex, while Engineering speedups – even with GenAI – are largely linear. Thus Product needs to get a handle on the complexity of the exploration space and focus on the right things before production code gets written. Failure to understand this can turn you into a living example of the saying (paraphrasing Frank Westheimer) that “weeks of coding can save you hours of planning.”

By increasing engineering speed, GenAI may put additional pressure on Product to experiment more and make product decisions faster, in order to not become the bottleneck in terms of improving customer value.

What “good” looks like

To start, you need to form the right hypothesis about how your company adds value to your customers (for multi-product companies, these hypotheses may be product-specific).

In our previous example with search engines, the hypothesis for building a leading search engine was: “Based on a search query, the search engine ought to provide an accurate and fast way for its users to find any site or page they’re interested in.”

Then, from the hypothesis, you need to derive the right leading metrics to measure the customer value and impact delivered. In our example, those leading customer value metrics are relevance and speed.

Your whole company, including Engineering, Product, Marketing, Sales, and of course your CEO, needs to be aligned with and focused on improving these metrics.

Urgency should not be about “more features, faster” but around improving the customer value metrics. “Faster” means how quickly you can move the needle. “More” means, by how much.

All teams need to work together and think out of the box about how to make this happen. Gauging out-of-the-box thinking ability is not an exact science (nor do we know how to quantify it), but here are some qualitative factors that contribute to it:

  • Are your teams senior enough? Do you have top-notch product managers and engineers?
  • How well are the product and engineering teams working together? How about working with the other teams such as GTM and Marketing?
  • Does the product team make the right decisions fast? Are they using an experimental approach to validate their hypotheses?
  • Are people encouraged both to generalize or specialize requests and ideas?  Sometimes, a modest generalization can have a huge impact.  Sometimes, a small specialization can radically reduce work.

What to do when the boss comes knocking

If you’re the CTO or Head of Engineering, there’s a good chance that – sooner rather than later – the powers that be will initiate a conversation on Engineering productivity. They will ask you how you measure Engineering productivity today, how your team’s productivity compares to your competition, and how to increase the productivity of your team – even if they are not entirely sure what that means or how to measure it.

If you’ve been focused solely on engineering, the honest answer is that you only track hygiene metrics (such as DORA or SPACE), and have no idea how to really measure productivity. Your boss won’t like that answer.

There’s a good chance they will ask you to measure the wrong things (e.g., number of features shipped, bugs fixed, or storypoints completed). This is a critical moment of truth for you and your company. You need to have the guts to push back on any simplistic approaches, and try to secure political support from other functions (Product, Sales, Marketing, Customer Success) when you do. Winning this debate is hard, hazardous, and not a solo mission.

Nobody wants to hear the fundamental truth that Engineering productivity is inherently impossible to measure in the abstract, so here’s what you will need to do:

  • Show that you’re not afraid to be measured and care deeply about operations: implement your hygiene metrics (e.g., DORA or SPACE), be an ace at measuring engineering operations, use the best tools, and have the best reporting and comparisons to industry benchmarks.
  • Demonstrate you care about business metrics and that you’re accountable for business outcomes like gaining market share, shortening sales cycles, improving win rates, and reducing churn. Be willing to sign up for OKRs in areas like these instead of ones on LoC produced, bugs fixed, commits made, or storypoints shipped. 
  • Argue that the above business metrics are trailing metrics, and the company needs leading ones that you can iterate quickly on and optimize. 
  • Refuse to measure inane things (i.e., count angels on pinheads) just because someone asks you to and you are able to measure them. You could count pizza slices per commit, too. Or pizza-adjusted development velocity (storypoints completed per pizza consumed). And those metrics might actually be more related to real engineering output than some of the things you’ll be asked to measure.

Getting the first point out of the way demonstrates you are not metrics-averse but stupid-metrics-averse. It’s your responsibility to fix the operational metrics if they need fixing. However, as long as these metrics are good, your CEO or CFO should be as concerned by them as they are about your variable naming convention. Also, you need to make sure your engineers understand that these operational metrics are not their ultimate goal, as customers don’t care about them.

You can show that you’re highly accountable for business outcomes by asking to be measured on business metrics such as ARR, churn, etc. This underlines the following:

  • Your company’s job is to build the product that lets you hit the plan. If you build some other product – to-spec and on-time – that doesn’t enable the company to hit its plan, then who cares?
  • If you’re in a startup, your total compensation is already heavily weighted by your equity. So, while the annual bonuses are nice, they are small relative to your goals.
  • You become totally aligned with your peers in Product, Sales, Success and even Services on the business outcomes. Few quarter-ends are more divisive than those where one team is celebrating OKR achievement while the rest of the company has missed the business plan.

You should initiate a discussion around customer value – what’s your hypothesis as a company and what are the correct leading metrics? Don’t expect this to be a short conversation; it will be a long and ongoing one, but it’s time far better spent than on tracking and reporting the wrong metrics. Everyone at C-level needs to buy into these leading metrics. If they’re not familiar with the concept, send them this article and its companion.

Ideally, these value metrics should be pushed by the founders (and/or CEO), who need to hold everyone accountable for advancing the chosen metrics. This should be the real measure of how fast and how much progress you’re making as a company.

This isn’t something you can do on your own – you need to form a great partnership with your Product counterpart and other C-level executives, and align with your peers on how much you need to move the needle on the customer value metrics to enable the company to hit this year’s and next year’s operating plan. You also need to ensure your engineering team understands both the customer value hypothesis and metrics, and are actively involved in optimizing them. Challenging your team on outcomes – as opposed to features – will empower and encourage them to think out of the box.

Conclusion

If after all of this your CEO still insists on the number of features shipped, or how precisely you’re able to honor release dates, or engineering-only productivity metrics – you should not give up. Form alliances with your fellow execs to formulate a customer value hypothesis and corresponding metrics. Present this to your CEO and get their buy-in and support.

Without proper alignment, if your company is alone in its space you may still have a shot. But if you’re competing against several well-funded competitors in a hot, greenfield market, someone is going to figure out the best leading customer value metrics. And the company who does that, commits to optimizing them, and aligns its engineering output around those metrics, will likely be the winner in the space.