Thursday, February 7, 2013

Yes, We Can: Agile Measurement, Productivity, and Value


First of all, some background: writing the chapter on agile software development for the next edition of the Computer Science Handbook (the third edition will say a lot more about software engineering than the previous two editions) last summer gave me the opportunity to take another look at some current trends and initiatives in Agile. There are plenty of them out there, like Neil Maiden’s work on introducing creativity techniques in agile development. But one that caught my eye in particular was a recent uptick in activity around the benchmarking of agile projects – that is, their comparative productivity.

Agile software development has been presented by proponents as a more effective alternative approach to previous methods, but frankly, the evidence has often been anecdotal. Even the evidence for comparative effectiveness among the various agile methods has often been mostly anecdotal. One early exception, of course, was a series of empirical investigations of pair programming. And Test Driven Development has also been the subject of number of empirical studies. But these are all narrow studies of particular techniques, and as useful as they are, they don’t capture the big picture. And with the rise of mindsets like evidence based software engineering, the anecdotal claims for the effectiveness of agile methods have started to wear a bit thin in the past few years.

So it was interesting to see a very large initiative underway last year at the Central Ohio Agile Association where a lot of companies got together with QSMA Associates, the software benchmarking company, to benchmark their projects. Particularly interesting is that, because of the QSMA involvement with their huge database of something like ten thousand projects stretching back 30 years, the benchmarking is against all kinds of projects in industry: big, small, agile, non-agile – in other words, the big picture of how Agile measures up in the world of software development.

I found this quite intriguing and talked with the folks at QSMA about it, and they agreed to participate in an agile benchmarking initiative here in Italy. So when I gave my talk in November at the Italian Agile Day in Milan, I made an appeal to round up some participation from the Italian industry for an agile benchmarking initiative in 2013. It turned out, however, that there is still a lot of perplexity in the community here around the idea of benchmarking agile projects, and there were some questions involving points of view expressed at various times by software engineering luminaries such as Martin Fowler, Tom DeMarco, and Joshua Kerievsky.

I was a bit surprised by this perplexity, because the QSMA folks have been working with the agile community for several years on productivity measurement and benchmarking. In fact Michael Mah of QSMA worked directly with Joshua Kerievsky on what was probably the very first agile benchmarking project ever, a Canadian medical systems company, along with Jim Highsmith.

So I decided to take a look at where the source of all this perplexity might lie. Since it was explicitly mentioned in a query I received, let’s start with something Martin wrote. In a post written in 2003, Martin concerned himself with the problem of measuring productivity in a software development context. Although it’s camouflaged to some extent, there are actually two questions considered in that post:

  • Can you measure productivity?
  • What is the relationship between productivity and value creation?


The first question is explicit in the title. The second question comes out in the text.

But let’s start with the first question: can you measure productivity in software development? Right away we encounter the famous “Lines of Code” (LOC) problem, which has two aspects: First, comparing LOC written in two different programming languages; second, comparing LOC written by two different people. And indeed, these issues do exist “in the small”: the same program written in assembler and APL will have a very different size; and Tom and Jerry may well solve the same problem with two programs of different sizes. But all serious productivity measurement has long since stopped working “in the small,” at the point-individual level. In the large, in the real world, the picture is much different. The issues you see in individual cases don’t appear in the large. For one thing, programming teams today are generally using modern high level languages of similar complexity. And, as Fred Brooks pointed out in his classic piece No Silver Bullet while discussing the impact of programming languages on productivity, “to be sure, the level of our thinking about data structures, data types, and operations is steadily rising, but at an ever decreasing rate.”

On a personal note, I got my first inkling of that back when I was a student at Yale. When discussing learning a new programming language, my advisor Alan Perlis – winner of the first Turing Award, coiner of famous epigrams on programming, and as Dave Thomas reminded me last year in Munich, the true ur-father of design patterns with his programming idioms – suddenly blurted out with a wave of his hand, “Oh, they’re all the same anyway.” And this was the person who had led the team that created Algol60, the “mother’s milk of us all,” as he put it, and the most famous APL freak in history. I don’t want to downplay the effect of programming languages too much, and of course it’s always good to try to use a solid, modern programming language and the right one for the job. I’m just saying that in the grand scheme of things, that’s not really where the issues lie in productivity measurement today. And the QSMA experience over thirty years confirms that, covering every class of software application.

On the other hand, the second aspect of the LOC question – different results from different people – might be cause for some concern. But once again, such concern comes mostly from thinking “in the small.” Sure, two individual programmers might code more or less, but this is mostly meaningless in the real world. It’s a rhetorical argument. What we are observing in actuality is what is accomplished by teams working “in the large”, complete with stand-ups, iteration demos, and in many cases, pair-programming – complete with on-the-spot code reviews. Good agile teams build the right code – not too little, not too much – to deliver the desired functionality described by stories and, say, story points. (Rarely are we talking about two programmers having a competition, like the good old days of APL “one liners” and that sort of thing.)

Moreover, there is a corresponding time to build that system, by a certain sized team, with a given amount of effort and cost, at a level of quality. Suppose a team of 12 people takes 5 months and successfully delivers working software for 83 stories, totaling 332 Story Points. After 10 sprints they finish the system. It is comprised of Java and XML totaling 32,468 new and changed code. 48 bugs are found and fixed during QA and it is put into service. One could look at research statistics and observe a comparison: Industry average for this same amount of software requires 16 people, taking 6 months, and during QA there are generally 96 bugs in the code that are found and fixed.

In short, this hypothetical team has delivered working software that has half the defects, a full month faster than average. In Kent Beck’s view, he would declare that Agile could be considered more successful with exactly that sentence I just wrote. Jim Highsmith would claim that by focusing on quality (cleaner code), we received the benefit of delivering faster (less effort thus less time required to test out bugs at the end).

So in actual practice, these ways of looking at productivity are considered meaningful and important to Agile leaders, and are absolutely measurable – every project has those three numbers: size, time, effort. Both Mike Cohn and Jim Highsmith reference the work of the QSMA folks their latest books. Sure, you can discuss the techniques used for measurement, such as story points. Joshua points out in his blog post that we can now do better than story points and velocity. No problem – we should always go with the best techniques available – but he isn’t taking issue with agile measurement in itself, and neither are the others.

So where is the problem people are seeing? This brings us to Martin’s second question: “What is the relationship between productivity and value creation?” Martin has been very good over the years at reminding us that simply delivering code isn’t the ultimate goal of software development. The ultimate goal is delivering business value, and it’s not always necessarily in a strict, lockstep relationship to productivity. As they say in the oldest discussion in the world, size doesn’t always matter: Martin immortalized that little gem of JUnit with the Churchillian phrase “never in the field of software development have so many owed so much to so few lines of code,” and offered similar (deserved) praise to the short No Silver Bullet essay of Brooks. Time doesn’t always matter either, as Martin once pointed out: Windows 95 was finished way over schedule and budget, but ultimately generated enormous value for Microsoft.

The question of productivity and value pops up in many areas, of course. Eighty-six year old author Harper Lee has written only one book – and that was over 50 years ago. But To Kill a Mockingbird was voted in at least one poll as the greatest novel of all time. (And while we’re on the subject of business value, it’s also the ninth-best selling book ever.) On the other hand, a couple of years ago there was a competition in the Washington Post to become their newest opinion writer, and one of the requirements was to demonstrate the ability to produce a full-length column, week after week, year after year. Famously, Ernest Hemingway would spend an entire day suffering over a single sentence; Isaac Asimov reeled off books effortlessly. Mozart spun out his masterpieces at a dizzying rate; Beethoven labored slowly to produce his.

So what’s going on here? Aside from the caveat that you can only take parallels between art and technology so far, the key to making sense of all this is to remember an important fact: Value is created at the level of the business. Now, the agile community talks about this a lot, but I often get the feeling they’re talking around it rather than about it, so let’s dwell on it for a few minutes. The Lean folks in particular have known for a while that value creation is at the level of the business – I remember Mary Poppendieck nodding vigorously when I said this at my keynote at the XP2005 conference in Sheffield, and they have continued to develop the idea. So this discussion isn’t that new in the agile community. Yet it doesn’t seem to have been fully digested yet.

Those of you who work in the safety critical systems community will have heard people say that safety is an emergent property of the system. By this they mean that you can’t tell at the level of individual parts, no matter how well made, whether a system is safe. Safety can only be evaluated within the overall context in which the system will be used. Analogously, you could say that value is emergent at the level of the business – it is determined by the overall business context in which the product is embedded. It is not determined at the level of operations; and software development is at the level of operations. This can be a hard pill to swallow, especially to us agilists who talk a lot about delivering business value – and now we hear that we don’t produce it directly. But it speaks directly to Martin’s point about how productivity won’t automatically be a determinant of business value, because they’re at two different levels. But it’s a copout to stop there: just because it isn't an automatic determinant, doesn’t mean it’s irrelevant – far from it. So let’s continue the analysis.

If you can’t judge business value at the level of operations (e.g. software development), then how do you do it at the level of the business? Just as safety assessors work with a framework for judging safety at system level, you need a framework for understanding what creates value at the level of the business. The framework elaborated several years ago by Michael Porter has withstood the test of time. It’s simple and straightforward. Even in this post-Web 2.0 networked social hyper ecosystem time to market era, there are still just two determinants of business value creation: Market Economics and Competitive Position.

Let’s start with market economics: If you work in “Lucrative market X,” then you’re simply going to be more likely to create value than if you work in “Struggling market Y.” This is related to what Tom DeMarco is talking about in his 2009 article in IEEE Software that was mentioned in one of the discussions on productivity and measurement. He talks about choosing a valuable project that amply covers the cost of whatever resources you put into it, thus needing less control than a project at risk of not covering its costs. (This is another example of “in the small” point-comparison. The more general, “in the large” version is working in a more attractive market.)

This brings us to the second determinant of business value creation. Within a market, it’s your competitive position that will determine whether you create business value. (And by the way, the companies with strong competitive positions even in a weaker market are still likely to outperform those with weaker positions in more lucrative markets.) Here again, the framework is surprisingly simple. There are only two ways to improve your competitive position: successful differentiation and economic cost position. Differentiation is all about creating that special something that the customer is willing to pay for. Apple is legendary for this type of business value creation, and much of the current discussion around innovation fits in here. Note, by the way, that differentiation isn’t as easy or as common as it might seem from all the hype. There are even signs that Apple is faltering in that department, if you’ve been watching the markets recently. Economic cost position is essentially about lower operating / production costs. Much of software process and quality improvement (including Agile) is about this, of course.

Agile processes do a great job of supporting all of these strategies for business value creation through improved competitive position – I wrote more about it here – but they only support them. Consider a phrase like “ … satisfy the customer through early and continuous delivery of valuable software.” First of all, think about that expression “valuable software.” Given that at the operational level you simply can’t determine whether the software is valuable, the real meaning is more like “the software that the customer is asking for.”

Considering that the goal of waterfall processes is also to deliver what the customer is asking for, the most agile processes can claim here is that they are more likely to deliver what the customer is asking for. Fair enough, and I happen to agree, but that’s not exactly a strong argument for value creation. It must be the “early and continuous delivery” part, then. And indeed it is: early and continuous delivery supports both a differentiation and an economic cost strategy for value creation through competitive position. If it’s differentiation the customer is after, then delivering features early and continuously gives the customer the chance to test them to see whether the customer really will pay for them. (And if it’s innovation you’re after, let me mention once again Neil Maiden’s work on injecting creativity techniques into agile processes.) And I think we’re all in agreement about the potential of agile processes to produce more with lower costs and defects to support an economic cost position strategy for business value creation. That’s Productivity with a capital “P”.

The point is that although high productivity isn’t an automatic, a priori guarantee of value creation in all cases (it doesn’t entirely capture the essence of Market Economics and some aspects of differentiation / innovation, which is where many of those exceptional examples come from and has to be analyzed at the business level), it is absolutely an important factor in the operational support of the most frequent and important business value creation strategies based on Competitive Position in one’s chosen market. Generally, and in the large, if your team productivity is higher, in real-world practice you are almost certainly making a strong and direct contribution to the creation of business value through a strengthened competitive position.

In conclusion I’d like to mention one last software engineering luminary. Philippe Kruchten, who is particularly well-known in the architecture community but also in the agile community (he recently co-edited a special issue of IEEE Software on Agility and Architecture), attended the ten-year agile manifesto anniversary reunion and listed some observations about “elephants in the agile room” in his blog afterward. Note that one of his biggest elephants (undiscussable topics) is resistance in the agile community to gathering objective evidence.

Philippe suggests that this resistance in the agile community to gathering objective evidence is part of a “latent fear that potential buyers would be detracted by any hint of negativity.” But aside from considering that buyers will also be detracted by a reluctance to back up claims with real evidence, there is a much more positive way to look at it all: buyers and managers who see real objective evidence of superior productivity with Agile will be a lot more willing to invest their money in it. And they’ll even be happy to see objective evidence of where problems lie, because they’ll know they can efficiently spend the money exactly where it’s needed to fix those problems.

And finally … without metrics, you’re just someone with a different opinion, as Steven Leschka of HP once said. Are we really going to just give up on gathering this objective evidence and leave the field open to Agile’s detractors and their different opinions? So far, the results of the Ohio agile benchmarking study have been very positive – certainly no reason for latent fear and concern – and I hear that an agile benchmarking initiative is starting up in Germany with several companies already onboard. I hope the Italian Agile Benchmarking Initiative gets similar participation.