Thursday, February 7, 2013

Yes, We Can: Agile Measurement, Productivity, and Value

First of all, some background: writing the chapter on agile software development for the next edition of the Computer Science Handbook (the third edition will say a lot more about software engineering than the previous two editions) last summer gave me the opportunity to take another look at some current trends and initiatives in Agile. There are plenty of them out there, like Neil Maiden’s work on introducing creativity techniques in agile development. But one that caught my eye in particular was a recent uptick in activity around the benchmarking of agile projects – that is, their comparative productivity.

Agile software development has been presented by proponents as a more effective alternative approach to previous methods, but frankly, the evidence has often been anecdotal. Even the evidence for comparative effectiveness among the various agile methods has often been mostly anecdotal. One early exception, of course, was a series of empirical investigations of pair programming. And Test Driven Development has also been the subject of number of empirical studies. But these are all narrow studies of particular techniques, and as useful as they are, they don’t capture the big picture. And with the rise of mindsets like evidence based software engineering, the anecdotal claims for the effectiveness of agile methods have started to wear a bit thin in the past few years.

So it was interesting to see a very large initiative underway last year at the Central Ohio Agile Association where a lot of companies got together with QSMA Associates, the software benchmarking company, to benchmark their projects. Particularly interesting is that, because of the QSMA involvement with their huge database of something like ten thousand projects stretching back 30 years, the benchmarking is against all kinds of projects in industry: big, small, agile, non-agile – in other words, the big picture of how Agile measures up in the world of software development.

I found this quite intriguing and talked with the folks at QSMA about it, and they agreed to participate in an agile benchmarking initiative here in Italy. So when I gave my talk in November at the Italian Agile Day in Milan, I made an appeal to round up some participation from the Italian industry for an agile benchmarking initiative in 2013. It turned out, however, that there is still a lot of perplexity in the community here around the idea of benchmarking agile projects, and there were some questions involving points of view expressed at various times by software engineering luminaries such as Martin Fowler, Tom DeMarco, and Joshua Kerievsky.

I was a bit surprised by this perplexity, because the QSMA folks have been working with the agile community for several years on productivity measurement and benchmarking. In fact Michael Mah of QSMA worked directly with Joshua Kerievsky on what was probably the very first agile benchmarking project ever, a Canadian medical systems company, along with Jim Highsmith.

So I decided to take a look at where the source of all this perplexity might lie. Since it was explicitly mentioned in a query I received, let’s start with something Martin wrote. In a post written in 2003, Martin concerned himself with the problem of measuring productivity in a software development context. Although it’s camouflaged to some extent, there are actually two questions considered in that post:

  • Can you measure productivity?
  • What is the relationship between productivity and value creation?

The first question is explicit in the title. The second question comes out in the text.

But let’s start with the first question: can you measure productivity in software development? Right away we encounter the famous “Lines of Code” (LOC) problem, which has two aspects: First, comparing LOC written in two different programming languages; second, comparing LOC written by two different people. And indeed, these issues do exist “in the small”: the same program written in assembler and APL will have a very different size; and Tom and Jerry may well solve the same problem with two programs of different sizes. But all serious productivity measurement has long since stopped working “in the small,” at the point-individual level. In the large, in the real world, the picture is much different. The issues you see in individual cases don’t appear in the large. For one thing, programming teams today are generally using modern high level languages of similar complexity. And, as Fred Brooks pointed out in his classic piece No Silver Bullet while discussing the impact of programming languages on productivity, “to be sure, the level of our thinking about data structures, data types, and operations is steadily rising, but at an ever decreasing rate.”

On a personal note, I got my first inkling of that back when I was a student at Yale. When discussing learning a new programming language, my advisor Alan Perlis – winner of the first Turing Award, coiner of famous epigrams on programming, and as Dave Thomas reminded me last year in Munich, the true ur-father of design patterns with his programming idioms – suddenly blurted out with a wave of his hand, “Oh, they’re all the same anyway.” And this was the person who had led the team that created Algol60, the “mother’s milk of us all,” as he put it, and the most famous APL freak in history. I don’t want to downplay the effect of programming languages too much, and of course it’s always good to try to use a solid, modern programming language and the right one for the job. I’m just saying that in the grand scheme of things, that’s not really where the issues lie in productivity measurement today. And the QSMA experience over thirty years confirms that, covering every class of software application.

On the other hand, the second aspect of the LOC question – different results from different people – might be cause for some concern. But once again, such concern comes mostly from thinking “in the small.” Sure, two individual programmers might code more or less, but this is mostly meaningless in the real world. It’s a rhetorical argument. What we are observing in actuality is what is accomplished by teams working “in the large”, complete with stand-ups, iteration demos, and in many cases, pair-programming – complete with on-the-spot code reviews. Good agile teams build the right code – not too little, not too much – to deliver the desired functionality described by stories and, say, story points. (Rarely are we talking about two programmers having a competition, like the good old days of APL “one liners” and that sort of thing.)

Moreover, there is a corresponding time to build that system, by a certain sized team, with a given amount of effort and cost, at a level of quality. Suppose a team of 12 people takes 5 months and successfully delivers working software for 83 stories, totaling 332 Story Points. After 10 sprints they finish the system. It is comprised of Java and XML totaling 32,468 new and changed code. 48 bugs are found and fixed during QA and it is put into service. One could look at research statistics and observe a comparison: Industry average for this same amount of software requires 16 people, taking 6 months, and during QA there are generally 96 bugs in the code that are found and fixed.

In short, this hypothetical team has delivered working software that has half the defects, a full month faster than average. In Kent Beck’s view, he would declare that Agile could be considered more successful with exactly that sentence I just wrote. Jim Highsmith would claim that by focusing on quality (cleaner code), we received the benefit of delivering faster (less effort thus less time required to test out bugs at the end).

So in actual practice, these ways of looking at productivity are considered meaningful and important to Agile leaders, and are absolutely measurable – every project has those three numbers: size, time, effort. Both Mike Cohn and Jim Highsmith reference the work of the QSMA folks their latest books. Sure, you can discuss the techniques used for measurement, such as story points. Joshua points out in his blog post that we can now do better than story points and velocity. No problem – we should always go with the best techniques available – but he isn’t taking issue with agile measurement in itself, and neither are the others.

So where is the problem people are seeing? This brings us to Martin’s second question: “What is the relationship between productivity and value creation?” Martin has been very good over the years at reminding us that simply delivering code isn’t the ultimate goal of software development. The ultimate goal is delivering business value, and it’s not always necessarily in a strict, lockstep relationship to productivity. As they say in the oldest discussion in the world, size doesn’t always matter: Martin immortalized that little gem of JUnit with the Churchillian phrase “never in the field of software development have so many owed so much to so few lines of code,” and offered similar (deserved) praise to the short No Silver Bullet essay of Brooks. Time doesn’t always matter either, as Martin once pointed out: Windows 95 was finished way over schedule and budget, but ultimately generated enormous value for Microsoft.

The question of productivity and value pops up in many areas, of course. Eighty-six year old author Harper Lee has written only one book – and that was over 50 years ago. But To Kill a Mockingbird was voted in at least one poll as the greatest novel of all time. (And while we’re on the subject of business value, it’s also the ninth-best selling book ever.) On the other hand, a couple of years ago there was a competition in the Washington Post to become their newest opinion writer, and one of the requirements was to demonstrate the ability to produce a full-length column, week after week, year after year. Famously, Ernest Hemingway would spend an entire day suffering over a single sentence; Isaac Asimov reeled off books effortlessly. Mozart spun out his masterpieces at a dizzying rate; Beethoven labored slowly to produce his.

So what’s going on here? Aside from the caveat that you can only take parallels between art and technology so far, the key to making sense of all this is to remember an important fact: Value is created at the level of the business. Now, the agile community talks about this a lot, but I often get the feeling they’re talking around it rather than about it, so let’s dwell on it for a few minutes. The Lean folks in particular have known for a while that value creation is at the level of the business – I remember Mary Poppendieck nodding vigorously when I said this at my keynote at the XP2005 conference in Sheffield, and they have continued to develop the idea. So this discussion isn’t that new in the agile community. Yet it doesn’t seem to have been fully digested yet.

Those of you who work in the safety critical systems community will have heard people say that safety is an emergent property of the system. By this they mean that you can’t tell at the level of individual parts, no matter how well made, whether a system is safe. Safety can only be evaluated within the overall context in which the system will be used. Analogously, you could say that value is emergent at the level of the business – it is determined by the overall business context in which the product is embedded. It is not determined at the level of operations; and software development is at the level of operations. This can be a hard pill to swallow, especially to us agilists who talk a lot about delivering business value – and now we hear that we don’t produce it directly. But it speaks directly to Martin’s point about how productivity won’t automatically be a determinant of business value, because they’re at two different levels. But it’s a copout to stop there: just because it isn't an automatic determinant, doesn’t mean it’s irrelevant – far from it. So let’s continue the analysis.

If you can’t judge business value at the level of operations (e.g. software development), then how do you do it at the level of the business? Just as safety assessors work with a framework for judging safety at system level, you need a framework for understanding what creates value at the level of the business. The framework elaborated several years ago by Michael Porter has withstood the test of time. It’s simple and straightforward. Even in this post-Web 2.0 networked social hyper ecosystem time to market era, there are still just two determinants of business value creation: Market Economics and Competitive Position.

Let’s start with market economics: If you work in “Lucrative market X,” then you’re simply going to be more likely to create value than if you work in “Struggling market Y.” This is related to what Tom DeMarco is talking about in his 2009 article in IEEE Software that was mentioned in one of the discussions on productivity and measurement. He talks about choosing a valuable project that amply covers the cost of whatever resources you put into it, thus needing less control than a project at risk of not covering its costs. (This is another example of “in the small” point-comparison. The more general, “in the large” version is working in a more attractive market.)

This brings us to the second determinant of business value creation. Within a market, it’s your competitive position that will determine whether you create business value. (And by the way, the companies with strong competitive positions even in a weaker market are still likely to outperform those with weaker positions in more lucrative markets.) Here again, the framework is surprisingly simple. There are only two ways to improve your competitive position: successful differentiation and economic cost position. Differentiation is all about creating that special something that the customer is willing to pay for. Apple is legendary for this type of business value creation, and much of the current discussion around innovation fits in here. Note, by the way, that differentiation isn’t as easy or as common as it might seem from all the hype. There are even signs that Apple is faltering in that department, if you’ve been watching the markets recently. Economic cost position is essentially about lower operating / production costs. Much of software process and quality improvement (including Agile) is about this, of course.

Agile processes do a great job of supporting all of these strategies for business value creation through improved competitive position – I wrote more about it here – but they only support them. Consider a phrase like “ … satisfy the customer through early and continuous delivery of valuable software.” First of all, think about that expression “valuable software.” Given that at the operational level you simply can’t determine whether the software is valuable, the real meaning is more like “the software that the customer is asking for.”

Considering that the goal of waterfall processes is also to deliver what the customer is asking for, the most agile processes can claim here is that they are more likely to deliver what the customer is asking for. Fair enough, and I happen to agree, but that’s not exactly a strong argument for value creation. It must be the “early and continuous delivery” part, then. And indeed it is: early and continuous delivery supports both a differentiation and an economic cost strategy for value creation through competitive position. If it’s differentiation the customer is after, then delivering features early and continuously gives the customer the chance to test them to see whether the customer really will pay for them. (And if it’s innovation you’re after, let me mention once again Neil Maiden’s work on injecting creativity techniques into agile processes.) And I think we’re all in agreement about the potential of agile processes to produce more with lower costs and defects to support an economic cost position strategy for business value creation. That’s Productivity with a capital “P”.

The point is that although high productivity isn’t an automatic, a priori guarantee of value creation in all cases (it doesn’t entirely capture the essence of Market Economics and some aspects of differentiation / innovation, which is where many of those exceptional examples come from and has to be analyzed at the business level), it is absolutely an important factor in the operational support of the most frequent and important business value creation strategies based on Competitive Position in one’s chosen market. Generally, and in the large, if your team productivity is higher, in real-world practice you are almost certainly making a strong and direct contribution to the creation of business value through a strengthened competitive position.

In conclusion I’d like to mention one last software engineering luminary. Philippe Kruchten, who is particularly well-known in the architecture community but also in the agile community (he recently co-edited a special issue of IEEE Software on Agility and Architecture), attended the ten-year agile manifesto anniversary reunion and listed some observations about “elephants in the agile room” in his blog afterward. Note that one of his biggest elephants (undiscussable topics) is resistance in the agile community to gathering objective evidence.

Philippe suggests that this resistance in the agile community to gathering objective evidence is part of a “latent fear that potential buyers would be detracted by any hint of negativity.” But aside from considering that buyers will also be detracted by a reluctance to back up claims with real evidence, there is a much more positive way to look at it all: buyers and managers who see real objective evidence of superior productivity with Agile will be a lot more willing to invest their money in it. And they’ll even be happy to see objective evidence of where problems lie, because they’ll know they can efficiently spend the money exactly where it’s needed to fix those problems.

And finally … without metrics, you’re just someone with a different opinion, as Steven Leschka of HP once said. Are we really going to just give up on gathering this objective evidence and leave the field open to Agile’s detractors and their different opinions? So far, the results of the Ohio agile benchmarking study have been very positive – certainly no reason for latent fear and concern – and I hear that an agile benchmarking initiative is starting up in Germany with several companies already onboard. I hope the Italian Agile Benchmarking Initiative gets similar participation.

Monday, January 21, 2013

Info Iniziativa Italian Agile Benchmarking 2013

L'iniziativa Italian Agile Benchmarking 2013 e' cominciata. Ecco qualche informazione in più:

Monday, December 17, 2012

Iniziativa Italian Agile Benchmarking 2013

All’Italian Agile Day 2012 ho annunciato un’iniziativa di creare un Italian Agile Benchmarking Project, ispirato da una simile iniziativa che si è svolta con grande successo negli Stati Uniti. Un grande gruppo di aziende appartenenti al XP User Group di una vasta area si è messo insieme ad una autorevole società di software benchmarking, Quantitative Software Measurement (QSM), per misurare l’efficacia dell’approccio agile in modo concreto – cioè basandosi su veri dati di progetti.

Questa iniziativa di benchmarking ha avuto un duplice successo. Intanto ha procurato un interessante snapshot dello stato dell’Agile in quell’area. Ma ha anche dato l’opportunità ad ogni azienda di misurarsi direttamente con le altre aziende sia nel contesto specifico dell’iniziativa, sia nel settore IT in generale (QSM ha una base dati con risultati benchmarking di più di 10 mila progetti, agili e non-agili).

Penso che l’Italia potrebbe beneficiare di un’iniziativa simile per gli stessi motivi: è un modo di fotografare il vero stato della implementazione di Agile in Italia (e non solo chiacchiere) nell’ industria; per chi sta cercando di promuovere implementazione di Agile in Italia, è un modo di fornire delle vere prove della sua efficacia tramite dati concreti; ed è un modo per una azienda di capire meglio le forze e le debolezze dei suoi attuali processi di sviluppo. Quest’ultimo punto non è da sottovalutare: in questi tempi di crisi, è più che mai importante investire per migliorare dove possibile la propria competitività. Questa iniziativa è uno strumento importante per farlo.

Più aziende riusciamo a mettere insieme per l’iniziativa “Italian Agile Benchmark 2013”, più renderà per i partecipanti, più efficace sarà l’impegno di risorse, e più utile saranno i risultati anche per la comunità agile in Italia. Chiedo a chi pensa di avere un interesse potenziale di contattarmi per ottenere più precise informazioni su come l’iniziativa verrà realizzata. Per ora sto pensando a chiudere le iscrizioni dopo 10 aziende iscritte per gestirlo meglio.

Sunday, April 15, 2012


Viareggio, 14 April 2012

Over the years, I have repeated ad nauseum that the way I choose a topic every year for the next lecture is simply to hold up a finger to the wind and try to sense what is all the rage at the moment. And, as you can see, this year it turned out to be “innovation.” Why innovation? A couple of reasons come to mind:

· 2011 was the year in which the death of a public figure brought about an outpouring of mass grief (some say hysteria) around the world at a level that had not been seen since the death of Princess Diana. When Steve Jobs died, people everywhere were talking about the way his innovations had touched their lives in the most personal way.

· The great economic crisis in the Eurozone started in 2011 (and continues today – the Italian Stock Exchange lost nearly 6 percent this week). The day after I communicated to Lucia the chosen topic last December, there was a headline in the local newspaper that “Only Innovation Can Save Italy.” The word “innovation” is on everybody’s lips right now as the chief means of creating the growth that is needed to exit from the crisis.

Whatever the reasons, there is no question that innovation is in the air nowadays – in the newspapers, on television talk shows, on the Internet. The Washington Post has created an entire section named Innovations. Companies are creating innovation departments. Cities are competing to attract vibrant new startups to their technological innovation campuses.

Innovative companies are in the spotlight – Apple being the most prominent example, of course. But also companies that have failed due to lack of innovation are in the spotlight. The current poster child for innovation failure is Kodak, which recently [19 January 2012] filed for bankruptcy protection. Article after article is analyzing why they didn’t catch the wave of digital photography, while mourning the passing of the company that nearly singlehandedly brought photography to the masses. (I was one of the millions of young people whose first camera was a Kodak Brownie. When the original Brownie camera was introduced in 1900 it cost one dollar, and a roll of film cost fifteen cents.)

So, given that we are all obsessed with innovation at the moment, it seemed appropriate to take a closer look. Now, innovation is a topic of enormous scope, and a lot is being invested in the pursuit of innovation all over the world right now. We can’t possibly expect to cover absolutely every aspect of innovation in this lecture, and frankly, you would probably be very, very bored by the kind of talk you hear in industrial or economic circles. So I’ve decided to narrow the focus today to some specific aspects of innovation that I think we’ll all find more interesting – the human aspects, or what we might think of as the “psychology of innovation.”

Are we all innovators?

The very first question I’d like to consider together with you is very simple: Who can innovate? Are innovators people with some kind of special, magical talent? That’s the traditional point of view, and one that is still held by many. Or is that point of view perhaps wrong? Maybe anybody can innovate. Maybe we are all potential innovators – it just has to be somehow “drawn out of us”. This is a point of view that many companies are in fact counting on right now. They have started “innovation initiatives,” whereby all employees are encouraged to come up with their own ideas for innovation and propose them to top management.

In my role as an associate editor of a computer software journal a couple of years ago, I personally handled the review of an article submitted by a company that proudly described its innovation initiative in great detail. It was open to all employees in the company, from the highest to the lowest. It provided for employees to be rewarded in various ways for their successful ideas, including the possibility to participate in the implementation of their idea as a new business venture undertaken by the company. It was all very exciting, and although I didn’t have the chance to track the company’s subsequent success in that initiative, it certainly seemed like a wonderful way to promote the idea that “we are all innovators.”

And yet: Nicholas Carr (whom I have already introduced in a previous talk as the gadfly thorn in the side of information technologists everywhere with his remorselessly iconoclastic pronouncements on some of our most cherished beliefs) has a different story to tell. In his book Building Bridges, he describes a company that introduced just such an “innovation initiative”, but whose subsequent experience was anything but exciting. It turned out that the most talented people in the company considered the initiative to be a kind of silly publicity stunt, a distraction, and in general a waste of time, and essentially ignored it. Conversely, the least talented people in the company considered the initiative to be a great way to relieve the boredom of their normal jobs and embraced the initiative with great enthusiasm, coming forward with idea after idea – all this while ignoring their normal duties within the company, of course. The problem was that the ideas of these less talented employees (as might be expected) were terrible, and nothing good ever came out of the initiative.

So which will it be? Are we all innovators or not? Well, it just might be that we are not quite looking at it in the right way. Maybe the right question is a slightly different one. And the best explanation I have seen yet was in a movie about a dead cook and a rat.

The wrong question?

As far as I’m concerned, Pixar’s 2007 animated film Ratatouille is not just one of the best cartoons around, but one of the best films, period. I wasn’t surprised at all when a New York Times article reported that this was the first time that people were openly wondering whether it was allowed by the rules to select a cartoon for the Oscar at the Academy Awards in the category of “Best Picture.” (I was so enthusiastic about the film that I forced a food-loving Welsh couple to sit through it one evening at my house – only to discover that she had a rat phobia and was miserable the entire time. After that I let people decide for themselves whether they wanted to see it.)

Although many aspects of the plot were complicated, the main idea was simple enough: a recently demised Parisian cook named Chef Gusteau had written a famous book with the title Anybody Can Cook that inspired many, including a rat from the provinces. The rat turned out to have an enormous culinary talent and ended up cooking a meal for the famous and feared Parisian food critic named Ego.

Ego was absolutely floored by the meal, and even more so when he found out that it had been cooked by a lowly rat. The next day his review appeared in the newspaper, and among many other things, he had this to say:

In the past, I have made no secret of my disdain for Chef Gusteau’s famous motto “Anyone Can Cook.” But I realize, only now do I truly understand what he meant: not everyone can become a great artist – but a great artist can come from anywhere.

So, following Chef Gusteau and Ego, we are not all innovators. But at the same time, there is no telling who among us might turn out to be the great innovators.

The Most Beautiful Woman in the World

A startling illustration of this principle was provided in a book published just last year [2011]. One of the most famous stars of the golden age of film was Hedy Lamarr (who died only a few years ago, in 2000). She was known in her heyday as the Most Beautiful Woman in the World, and also had the distinction of being in the first nude scene in a studio motion picture (if you’re interested, the film was called Ecstasy and was made in Czechoslovakia).

Another figure in the golden age of film was George Antheil, who, from the late 1930s onwards, was a much sought-after composer of music for films, including those starring top-echelon actors such as Humphrey Bogart. (In addition, he himself vaguely resembled the actor James Cagney.) But he was much more than just a “film composer.” He was an avant-garde composer who was good enough to convince musicians of the caliber of Igor Stravinsky to befriend and support him. He moved easily in the milieu of the Lost Generation and even lived for a while in the apartment above Silvia Beach’s legendary bookstore Shakespeare and Company in Paris, where he hobnobbed with writers such as Ezra Pound. His roguish behavior (like putting a pistol on top of the piano when he performed) got him the reputation of being a kind of “bad boy of music” (which he then chose as the title of his 1945 autobiography), and he was beside himself with delight when a riot broke out during the Paris debut of his music, evoking comparisons to the riot that had greeted the premiere of Stravinksy’s The Rite of Spring many years earlier in Paris. One of his most bizarre compositions was called Ballet Mécanique, whose list of instruments included sixteen synchronized player pianos, a siren, and an airplane propeller.

Now, after this description, I’m quite sure that the last thing that comes to mind when you think of either Hedy Lamarr or George Antheil is “innovation,” at least in the sense that we have been talking about. I suppose you could think of Hedy Lamarr’s pioneering nude scene in a studio film as an innovation of sorts; and George Antheil did write some avant-garde music that might deserve the label “innovative”. But surely, you say, we are far away from the kind of technical innovation that is the subject of this talk, aren’t we? The kind that Steve Jobs and Apple are famous for? Well, it turns out that you are dead wrong. In one of the most extraordinary partnerships in the history of science, Hedy Lamarr and George Antheil became the co-inventors of a technology that is the basis for much of modern communications.

In addition to being a composer, Antheil also published a crime novel (whose protagonist was based on Ezra Pound). He was a successful and insightful reporter and critic for a music journal. And he even wrote a nationally syndicated advice column in the style of “Ask Ann Landers” and “Dear Abby.” This is how he eventually met Hedy Lamarr, who originally sought him out in order to find out more about his pet theories on women’s endocrinology that he had been promoting.

This was early in World War II, and they shared an interest in helping to defeat the Nazis. Hedy Lamarr’s first husband (she ended up marrying six times altogether) had been an Austrian arms manufacturer, and she had learned a lot from him – and she also turned out to have significant mathematical talent. George Antheil brought in the experience he had gained from synchronizing player pianos in his Ballet Mécanique. They put their heads together and came up with a way to make it hard for the enemy to interfere with radio-guided torpedoes. The idea was as clever as it was simple: hop around from frequency to frequency so quickly that the enemy didn’t have enough time with each frequency to identify it and jam it. (It all had to be synchronized, of course, or the “good guys” wouldn’t find the frequency either – that’s where Antheil’s contribution came in.)

They were granted US Patent 2,292,387 for their invention that eventually became known as frequency hopping. This lay the groundwork for what became spread spectrum technology, which underlies communications technologies we all use today such as Bluetooth and WiFi.

As Ratatouille’s Ego said: you just never know where the new and innovative might come from.

Who are the innovators?

But this still doesn’t address an important question: what kind of person is most likely to be an innovator? This has been the subject of a lot of interest, of course – there is nothing that companies would more like to be able to do than discover ways to “find the great innovators.” I’d like to return now to the person I started this talk with: Steve Jobs. He was a charismatic, extroverted person who could singlehandedly mesmerize his audiences, setting up his famous “reality distortion field” and convincing people of pretty much anything. His presentations at the annual MacWorld conference in San Francisco were legendary.

But as Susan Cain, the author of a recent [March 2012] book entitled Quiet: The Power of Introverts in a World That Can't Stop Talking points out, the innovation that got Apple started – the Apple II computer – was not invented by the extroverted Jobs, but rather by his very introverted friend and colleague Steve Wozniak.

Cain writes that introverts are getting a raw deal in society today, to everybody’s detriment. As much as half of the population is made up of introverts, and yet our society tends to pay a lot more attention to extroverts (a bit like the situation of women half a century ago, when their talents were being wasted by a society that ignored their potential).

One of the ways in which extroverts are favored by today’s society is the tendency to promote doing things in groups. For example, one of the most popular techniques for stimulating innovation is “brainstorming.” Get a bunch of people in the room and let them bounce ideas off each other. The more people involved, the better; the more interaction, the better. There are a lot of variations on this theme. In my work, there is an approach to computer software development that has gained much favor over the last decade, called “extreme programming.” One of the tenets of this approach is that everybody should be physically together in one big shared space, where everybody can interact all the time. Lots of meetings together, no “compartmentalized knowledge,” everybody knows everything.

In fact, some have gone as far as to say that group work is the wave of the future in all innovation. In an article in the Washington Post [3 February 2012], Neal Gabler wrote:

In our global, networked economy, the lone wolf is rapidly becoming an anachronism, one that threatens to impede innovation rather than fostering it … while the idea of individual agency may have great appeal, innovation is increasingly coming from groups, not solitary heroes. Capitalism as a communal enterprise — dare we call it collective capitalism? — is the new engine of innovation …

But although Susan Cain isn’t against collaboration as such, she is leading a campaign against “groupthink.” And the reason has to do with something we already talked about two years ago in my talk on the Crowd. Recall that I said that it turns out that a crowd is only truly effective when its members are independent from each other. Cain emphasizes this fact even more. It is well-known that in group situations people unconsciously tend to take on the opinions of others in the group, and usually very quickly. It can last very little time before the entire group has taken on the position of the dominant or most charismatic person (who is often the most extroverted person), whether or not that person is right.

Quite the contrary to the extroverted ambience of “groupthink,” Cain notes that the kind of ambience that promotes independent, creative, innovative thinking has a much more introverted flavor:

Studies suggest that many of the most creative people are introverts, and this is partly because of their capacity for quiet. Introverts are careful, reflective thinkers who can tolerate the solitude that idea-generation requires.

Steve Wozniak said that he never would have learned the skills that eventually allowed him to create the first Apple computer if he hadn’t been too introverted to leave the house when he was young. As Cain notes, Wozniak has written the following advice in his memoir:

Most inventors and engineers I’ve met are like me ... they live in their heads. They’re almost like artists. In fact, the very best of them are artists. And artists work best alone ... I’m going to give you some advice that might be hard to take. That advice is: Work alone ... Not on a committee. Not on a team.

Introverts aren’t just important in producing innovation, they are important in the promotion of innovation in others. But there is a bias against introverts in leadership positions. There is generally an idea in society that extroverts are those most suited to leadership positions, and they are the ones who are most often groomed for leadership. While extroverts can certainly be great leaders, it is not universally true that they must be better leaders than introverts, and in fact the psychological characteristics of extroverts may actually stifle innovation in those whom they are leading. When an innovative idea is proposed by someone under his leadership, an extrovert may get so excited that he will take the idea and put his own stamp on it, effectively corrupting and possibly stifling the idea; whereas an introvert would be more likely to let the innovator run with his idea and allow it to come to fruition.

Cain’s advice: get in touch with your inner introvert and take the time to carve out some solitude for yourself. You might just find that you will become more innovative.

Can we recognize innovation when we see it?

We’ve been spending time on the question “What kind of person can be an innovator?” and have suggested that introverts may be a neglected human resource in that respect. But now I’d like to spend a few minutes on a slightly different question: “What kind of person can recognize innovation when he sees it?”

Before you start thinking that recognizing innovation is easy, and anyone can do it – after all, think of the mad rush to buy the latest iPhone – let’s take a look at a few well-known counterexamples:

· “Radio has no future.” – Lord Kelvin, Northern Irish mathematician and physicist, former president of the Royal Society, 1897.

· “The Americans have need of the telephone, but we do not. We have plenty of messenger boys.” – Sir William Preece, chief engineer of the British Post Office, 1876.

· “The cinema is little more than a fad. It’s canned drama. What audiences really want to see is flesh and blood on the stage.” – Charlie Chaplin, actor, producer, director, and studio founder, 1916.

· “Television won’t last because people will soon get tired of staring at a plywood box every night.” – Darryl Zanuck, movie producer, 20th Century Fox, 1946.

· “The Internet? We are not interested in it.” – Bill Gates, 1993.

· “But what ... is it good for?” – Engineer at the Advanced Computing Systems Division of IBM, 1968, commenting on the microchip.

And just in case you think it applies only to technological innovation, one last example:

· “A cookie store is a bad idea. Besides, the market research reports say America likes crispy cookies, not soft and chewy cookies like you make” – Response to Debbi Fields’ idea of starting Mrs. Fields Cookies.

Now, these were not stupid people. We could even say they were many of our our best and brightest minds. So it can’t be a problem of intelligence. But if it isn’t intelligence, then what is it that makes it so hard for us to judge innovation well? It turns out that a lot of it has to do (once again) with our own psychological makeup – and it doesn’t matter whether you are Albert Einstein or Alfred E. Newman, it’s inside all of us.

Consider this quote from Daniel Kahneman’s 2011 book Thinking, Fast and Slow:

“She’s raving about an innovation that has large benefits and no costs. I suspect the affect heuristic.”

What is he talking about? What is the “affect heuristic”? Kahneman is talking about some research carried out by Paul Slovic, in which he came to realize that the way people judge things (including innovations) is inextricably linked with their emotional reactions to it. It’s a case of what is known as “substitution”: instead of answering the hard question, which is “What do I think about this innovation?” people end up substituting an easier question, which is “How do I feel about this innovation?” Furthermore, once they have answered the question of how they feel about it, they make their lives even easier by driving their reactions to the extreme: the innovation either is the most wonderful thing in the world or the worst thing in the world. Why? Because this reaction keeps us from having to make hard choices between benefits and drawbacks. None of us can escape this phenomenon. Even poor Mr. H.M. Warner (as in “Warner Brothers Films”), who in 1927 said “Who the hell wants to hear actors talk?” wasn’t immune to the affect heuristic. It’s just our human nature.

The Innovator’s Dilemma

It’s bad enough when people fail to recognize the innovation made by others, but it gets worse: they’re also perfectly capable of failing to realize the value of their own innovation.

One of the most clamorous business events in the last few months [19 January 2012] has been the bankruptcy of Kodak, which I mentioned at the beginning of this talk. I also mentioned that they failed to “catch the wave of digital photography”. But what I didn’t mention was that Kodak itself developed the digital camera! That’s right: in 1975, one of Kodak’s own engineers (Steven Sasson) built the first digital camera. So why didn’t Kodak see the great innovation that they held right in their own hands, the innovation that would revolutionize the world of photography?

Today we know that they were victims of the so-called Innovator’s Dilemma, as described in the now-classic book with the same title written by Professor Clayton Christensen of Harvard University. In that book, Christenson explained that successful companies (like Kodak) are mostly concerned with constant improvement in their successful products, because that’s what their customers are asking for and are willing to pay for. Their customers want sustaining innovation in the products they are buying, such as ever higher-quality film. It turns out that digital photography was something that Christensen characterized with the now famous name disruptive innovation. Disruptive innovation offers an entirely new way to do things (like photography without film).

So where is the “dilemma” mentioned in the book’s title? The dilemma is that when it arrives, disruptive innovation almost never has a market. When Kodak invented the digital camera, its customers couldn’t have cared less. Their cameras used film. What were they going to do, throw them away? For what? A new camera that, by the way, produced pictures of much lower quality than film-based cameras? (That’s still true to a large extent today – so you can imagine how it was in 1975.) So a company generally has no motivation to pursue a disruptive innovation; on the contrary, it would be irresponsible toward its loyal customers to do so. Psychologically, they and their customers are blinded to the potential of the new technology. This phenomenon has occurred countless times, where prosperous, well-managed companies are brought to their knees by disruptive innovation that sometimes they themselves had invented, but did not pursue.

Another classic example of disruptive innovation is word processing. I remember the early days of word processing when my father’s law office had a huge, powerful, expensive Wang word processing system that all the secretaries used (my father used an old Underwood manual typewriter to write his own documents). That Wang system was the premier choice for word processing and it was hugely successful for years. Meanwhile, I had bought the very first portable computer, the Osborne 1. It had a two-inch square screen, and I wrote my documents using a rather primitive program called WordStar. The quality was far, far inferior to that of the Wang processing system. But it was disruptive. I was what later became known as an “early adopter” of this new way of doing word processing using general-purpose computers rather than specialized machines – and, as I write these words today using Microsoft Word on my laptop computer, I think I can safely say that it was the disruptive innovation that won out in the end.

The Role of Luck

It turns out that one of our biggest problems in dealing with innovation can be traced back to yet another one of our great psychological failings, something else that is inside every one of us: our insistence on finding a causal explanation for everything, even when there is none. Daniel Kahneman recounts how everybody talks about the amazing success of Google, a company that was able to innovate continuously and never made a single wrong move on the way to greatness. He points out that people talk about the Google story as though it was inevitable. But there was no inevitability; at every turn there was also a healthy dose of good fortune (such as a missing counter-move by some competitor). Kahneman puts it this way: Success = Talent + Luck. And Great Success = More Talent + a Lot More Luck. Rationally, we might “know” this. But intuitively we continue to try to find an explanation for everything. Kahneman illustrates the core problem with this statement:

Very intelligent women tend to marry less intelligent men.

Now, this is not a joke, it is a documented fact (indeed, my wife has observed this in our own marriage a number of times). Ask people why this might be so, and you will receive any number of explanations, such as “Intelligent women are the dominating type and like to choose a man they can control,” and so forth.

But in reality there is no causal explanation. It is a mere statistical fact that is an expression of the phenomenon of regression to the mean. You might remember this concept from my lecture two years ago on the Crowd, when I mentioned that it was discovered by Francis Galton, one of the great minds of the 19th century. Regression to the mean tells us that one extraordinary event is always more likely to be followed by a less extraordinary one, statistically speaking. It works both ways, too, of course: very intelligent men tend to marry less intelligent women (although my wife never seems to buy that part of the argument, for some reason).

In other words: sometimes an innovation is successful not because its inventor was introvert or extrovert, thin or fat, tall or small … but simply because he was lucky. Or, as the 17th century moralist Francois, duc de La Rochefoucauld said:

Although men flatter themselves with their great actions, they are not so often the result of a great design as of chance.


So what is our conclusion about what kind of person is best suited to be an innovator? This brings me right back, once again and for the last time, to the person with whom we started the talk: Steve Jobs. In an article written for the Harvard Business Review, Walter Isaacson, the writer of the biography that you now see everywhere, from the local bookstore to the Autogrill rest stop on the highway from Florence to Rome, wrote the following about Jobs:

He connected the humanities to the sciences, creativity to technology, arts to engineering. There were greater technologists (Wozniak, Gates), and certainly better designers and artists. But no one else in our era could better [connect] poetry and processors in a way that jolted innovation … The creativity that can occur when a feel for both the humanities and the sciences exists in one strong personality was what most interested me in my biographies of [Benjamin] Franklin and [Albert] Einstein, and I believe that it will be a key to building innovative economies in the 21st century. It is the essence of applied imagination, and it’s why both the humanities and the sciences are critical for any society that is to have a creative edge in the future.

I don’t know how true that quote will turn out to be, but I like it a lot. And I’m not the only one who agrees with it. I am one of the editors of a magazine called Uncommon Culture, which was founded by a group of scholars in European library science who are heavily involved in the initiative to digitize the artifacts of European culture. The latest issue of the magazine contains a Foreword by the Vice President of the European Commission, who suggests that “cultural material can contribute to innovation… and become the driver of new development.”

And I think this attitude represents well what we’re trying to do right here at the Club. We have talks on everything from poetry, theater, and art to history, science, and technology. Who knows, maybe we’re incubating some of the next great innovators right here in this lecture hall.


Richard Rhodes, Hedy’s Folly: The Life and Breakthrough Inventions of Hedy Lamarr, the Most Beautiful Woman in the World, 2011.

Walter Isaacson, Steve Jobs, 2011.

Daniel Kahneman, Thinking, Fast and Slow, 2011.

Nicholas Carr, Building Bridges: Essays on Business, Technology and Innovation, 2011.

Susan Cain, Quiet: The Power of Introverts in a World That Can't Stop Talking, 2012.

Clayton Christensen, The Innovator’s Dilemma, 2003.

Sunday, April 3, 2011

The Vodka was Great but the Meat was Rotten

Viareggio, 2 April 2011


When I announced last year that I would give a lecture entitled “The Crowd: Wisdom or Madness?” the only thing on most people’s minds was “What is that title supposed to mean?” Although it was a pretty cryptic title, it wasn’t a totally hopeless task to figure out what the lecture would be about; after all, the book The Wisdom of Crowds was available for purchase, so there was a chance that some of you might have heard of it.

But I really do have to concede that this year’s title must seem truly cryptic to you. And yet, believe it or not, there was a small chance of figuring out what it was about. It is something that is buried in the script (online) of my very first lecture, way back in 1992, nearly twenty years ago:

In fact, the best things to come out of machine translation programs were the jokes. As you can imagine, the hardest things to translate are idiomatic phrases and slogans. They gave the program the following phrase to translate into Russian: “The spirit is willing, but the flesh is weak.” The program came up with: “The vodka was great, but the meat was rotten.”

Artificial Intelligence Revisited

For many years I had wanted to give a talk here on automatic computer translation of language. It seemed like the perfect topic, because after all that’s what this association is all about: language. So many of us are professionally involved in languages – several are language teachers, some are translators. But the timing never seemed to be quite right. The progress I saw in the automatic language translation programs was not very convincing. They were slow, clumsy, and expensive – and in the end, not very good. But over the last few years, something has changed. Suddenly automatic translation is not just better than before – it’s a lot better. So what happened to make it click? That’s what I wanted to talk about today.

But when I went back to my old lecture of nearly twenty years ago to get the quote for my title, I realized that the story of why language translation has gotten so much better has a lot to do with what has happened in many of the areas I talked about in that lecture, and I decided it would be a good idea to take up the story again where I left it off, way back in 1992. And as we’ll see, as I always do in these lectures when I start predicting, I got a few things right – and a lot of things wrong.

Computer Chess

One of the things I talked about in my lecture of twenty years ago was “computer chess.” Back then, there was a lot of interest in building a computer that could match the best human chess players. Even much earlier, I pointed out, people had been fascinated by chess, and I briefly recounted the story of the so-called Mechanical Turk, a contraption that was exhibited for the first time in 1770:

A couple of hundred years ago there was actually a man touring around Europe with a beautiful robot-like machine with arms that hung over a chess board, that would play an excellent game of chess against all comers. However, it was discovered after some time that there was a man hidden inside of the robot who just happened to play a very good game of chess.

(Interestingly, picked up on the term “Mechanical Turk” again to describe a service they introduced in 2005 – because it was about people helping out machines to finish jobs they couldn’t do well yet, like recognizing images. Equally interesting is that the service is based on the idea of crowds – the topic of last year’s lecture.)

People thought that if a computer could beat a human in chess then, well, we were on our way to building intelligent machines. IBM had set itself a “Grand Challenge” to do just that, and had come up with a computer that it called Deep Thought (named after a computer in the series The Hitchhiker’s Guide to the Galaxy by Douglas Adams – and I think you can imagine where Adams might have gotten the name). A few years before my lecture in 1992, Deep Thought had managed to beat a grand master and generated a lot of excitement. But when matched against world champion Garry Kasparov, it was roughed up pretty badly. I reported in my talk that Kasparov had declared afterward:

I can't visualize living with the knowledge that a computer is stronger than the human mind. I had to challenge Deep Thought for this match, to protect the human race.

But I also reported that many felt that the days of the human race’s protection by Kasparov and others like him were numbered:

Now just about everybody accepts that within 5 or ten years a computer will be built that can beat any human being.

That was in 1992, and now we are in a position to know whether that prediction was accurate. Let’s take up that story and see how it turned out.

Deep Blue

After that humiliating defeat, IBM rolled up its sleeves and got back to work. Deep Thought was consigned to the dustbin and a larger, more powerful successor was designed, named Deep Blue. (They got more patriotic – IBM’s nickname is “Big Blue.”). It took a while to get everything worked out, but on May 11, 1997 – just under five years from the date I made that prediction in my talk – Deep Blue beat Garry Kasparov in a six-game match. A computer had finally beaten the world’s best human player.

Kasparov was furious. He said that he had “sometimes seen deep intelligence and creativity” in the computer’s moves. And what did he conclude from this observation? In a bizarre throwback to the Mechanical Turk chess machine of two centuries earlier, Kasparov alleged that Deep Blue had cheated, by getting help from … humans. Now, if you think you see a paradox in the best human being in the world thinking that the machine that beat him got help from other human beings, you’re not alone. After all, if he was the best human player on Earth, then what other human player could have possibly helped Deep Blue to beat him? The best player on Mars?

The whole episode ended with Kasparov challenging Deep Blue to a rematch. IBM refused and then dismantled Deep Blue, presumably just to make sure that such a rematch wouldn’t happen. (I didn’t read any reports about whether they found a little man inside when they dismantled it.) They also made a pretty good movie about it, called Game Over: Kasparov and the Machine. Like Kasparov, the film also implied that there may have been a conspiracy behind it all. But the conspiracy implied by the film was a different one – a plot to boost the stock price of IBM. Actually, that was a much more reasonable conjecture. IBM never claimed it was doing this solely for the good of mankind. Why not advance the state of technology and get some publicity in the process with these Grand Challenges? In fact, IBM was so happy with all the publicity that it set about finding itself another Grand Challenge to work on.

In the first Grand Challenge, IBM had set out to build a computer that could outperform humans in a game. In the next Grand Challenge, IBM decided to set out to build a computer that could outperform humans in … another game. And this brings us back to the topic of my lecture by way of a television game show that I used to love to watch as a kid.

Television Game Shows

The 1950s and 1960s were a time of fabulous creativity and growth in TV game shows in America, and they were being churned out at an amazing rate during those years. Many of them were eventually syndicated around the world, including Italy. One show from 1956 called The Price is Right had a great run in Italy from 1983 to 2001 as Ok, il prezzo è giusto, hosted much of that time by Iva Zanicchi (who called the show “the triumph of consumerism”). Another show called What’s My Line was invented in 1950, at the very dawn of the television era – and is alive and well today here in Italy, over 60 years later, under the name of I Soliti Ignoti, hosted by Fabrizio Frizzi.

The king of daytime television during much of that golden era of game shows was the multi-talented Merv Griffin. Born in 1925 in San Mateo, California (not far from my own birthplace), he had been a child prodigy pianist and soon ended up in show business in Hollywood. He appeared in several films after he was discovered by Doris Day, but eventually got tired of movies and decided to move into the rapidly expanding medium of television. He turned out to have a knack for understanding the new formats that would work on television, and became a successful host of both talk shows and game shows. He once showed up in the audience at the musical at my high school (High Button Shoes, if I remember correctly) – apparently he had a niece in the cast.

Before long, he was not only a host of TV game shows; he was also a creator of them. One day, as he told the story, he was flying to New York from Duluth, Minnesota with his wife. He was trying to come up with ideas for a new game show, and his wife made a suggestion. In the 1950s there had been a series of scandals around the TV quiz shows. The most prominent such scandal involved the show Twenty-One, where it turned out that the outcomes were fixed beforehand. After those scandals, people had stopped creating quiz shows. Griffin’s wife suggested that this might be a good time to propose a new one, perhaps with some kind of new twist to distinguish it. She thought she had an idea.


Her idea was to turn things around: instead of providing a question and expecting an answer from the contestant, you provide the answer and the contestant has to provide the question. For example, if you provided the answer “1861”, the contestant would have to provide the question “In what year was Italy unified?” To an answer “10 Downing Street,” the correct question might be “Where does the Prime Minister of Great Britain live?” The contestant would always have to provide the response in the form of a question.

The idea was so good that it was accepted sight unseen by NBC, and the show was named Jeopardy! (the exclamation point is part of the name). It was launched in 1964 and became a hit immediately. The game was a straightforward quiz, with several categories of questions (or, I should say: answers) for the contestant to choose from, according to ascending monetary rewards. At the end of each show was a section called “Final Jeopardy,” where each contestant had one last chance to influence the outcome by wagering up to his entire winnings on a single question.

That final question was accompanied by a catchy little tune that was timed to last exactly 30 seconds, the time that the contestants had to contemplate their final response. The title of that tune is “Think!” and it has an interesting story all its own. It was composed by none other than the versatile musician Merv Griffin himself – and was in fact originally a lullaby for his son. It has so insinuated itself into American popular culture now that it’s used for just about any situation in which there is a countdown while waiting for something – for example, during discussions in baseball games or at the horse racing track, or even during Perry Mason or similar legal TV shows when awaiting a verdict. Merv Griffin liked to boast that for the 10 minutes he spent writing that little melody, he had received over 70 million dollars in royalties over his lifetime.

I once attended a Jeopardy show in the 1970s when I was a student at Yale. With a fellow student, I went down to New York and was part of the studio audience for a series of three shows and got to see the host Art Fleming in action firsthand. Back in those days, there weren’t the fancy electronic displays there are today. Instead, it was literally a board with cardboard signs in the categories. When a selection was made by a contestant, a burly stage hand standing behind the board would simply pull off the piece of cardboard covering the question. For me, it was a bit like exposing the Wizard of Oz.

Later, when I was a student at Berkeley, I participated in an audition when the show’s producers swung through San Francisco looking for new contestants. I didn’t do so badly in the audition, but in the end they chose a sharp, good-looking young woman with a vivacious personality, and I can’t really say I disagreed with their choice.

Jeopardy was an enormous international success, with adaptations in over 25 countries – including Italy in the early 1970s, with Rischiatutto hosted by Mike Bongiorno. Jeopardy wasn’t the only successful TV game show that Griffin invented, by the way. The hugely successful Wheel of Fortune show was also his invention, and as we all know, it lived on with great success in Italy as the Ruota della Fortuna. He was eventually named by Forbes as the richest Hollywood performer in history, in large part due to his ventures in game shows.

So what was it that made Jeopardy so successful? It turned out that the clever inverse “answer/question” format encouraged extremely sophisticated uses of the English language, with many puns, shades of meaning, ambiguities, and subtle references that challenged the very best minds. The categories ranged over the entire spectrum of human knowledge, and so contestants had to be very well-read indeed. This was not a game for dummies – and it soon attracted a huge cult following. If that seems strange to you, consider the game show L’Eredità (“The Heredity”) that is running today on Italian television. Its final section, called “the guillotine” – in which the challenge is to guess a hidden word from among five words that are related in a way that the contestant must deduce – now numbers among its many fans no less a personality than the great Italian intellectual, professor of semiotics, and author (The Name of the Rose) Umberto Eco, who says he never misses it on TV (the show’s producers once personally dedicated a “guillotine” puzzle to Eco).

In summary, a successful contestant on the Jeopardy show had to exhibit an extraordinary command of the English language in all its grammatical nuances as well as a mastery of a broad range of subjects ranging from history to geography to literature to current events and popular culture. It turned out that Jeopardy embodied exactly the type of challenges that were being confronted by IBM and others in the area known as natural language processing.

Natural Language Processing

At this point, I’d like to go back once again to my old lecture and take up another topic and see how it turned out. Back then when I gave my lecture, nearly twenty years ago, there were essentially two opposing approaches being tried to making “intelligent machines”:

  • · Teaching things to them;

  • · Letting them learn for themselves.

The first approach involved the general idea of what were called expert systems. Suppose you wanted to make a computer that knew a lot about making medical diagnoses. The strategy was to find an expert in making medical diagnoses, and get all the knowledge out of his brain and into the computer – perhaps by interviewing him to coax the expertise out of him. This approach made a lot of sense to everybody. After all, human beings are the source of the knowledge we’re after, so why not go straight to the source?

Later on, this approach acquired even more credibility in the popular mind as the idea of the so-called Semantic Web emerged. Have you ever searched for a word or phrase on the Web and received the wrong answer because the meaning was ambiguous? For example, take the “Paris Hilton” problem: are you looking for a hotel in France or browsing for movie star gossip? Those who promoted the Semantic Web idea proposed that experts create various kinds of “dictionaries” inside the Web that can help sort out the different meanings a word or phrase can have. The more accomplished and capable the experts, the better the dictionaries will be, and the “smarter” the Web will become. This approach was adopted enthusiastically because it also seemed to lead directly and eventually (if not immediately) to the Holy Grail of computers that understand human language. And that made perfect sense, too: who was better qualified to teach computers to understand human language than humans themselves?

The second approach involved the general idea of “learning by example.” Officially, this approach became known as Machine Learning, whereby an initially ignorant computer would essentially learn from experience. It was quite successful in some areas, especially controlling sophisticated machinery like robots or helicopters. Take the case of the helicopter: it’s not that easy to get the control and balancing of a helicopter exactly right. So instead of trying to calculate exactly the right program to do the job, you just let the helicopter fly and see what happens. When the helicopter crashes, the control program considers that a bad outcome (thank goodness for that), and learns from the experience by avoiding the next time whatever it did to make it crash. (By the way, if you’re getting worried: of course we’re talking about unmanned model helicopters, not the real thing.) In time, after a certain number of crashes, the control program will have adjusted itself to do a far better job of keeping the helicopter in the air than a man-made program might have done. And in the control of robotic machinery the Machine Learning approach became almost universally the method of choice: you just let the machinery adjust itself after each “bad” outcome and it would eventually converge to the desired “good” outcome.

But a bunch of robots and helicopters learning how to move around and fly is a lot different from a computer learning to understand language. One is merely “motor learning,” whereas the other is “cognitive learning.” Why, they’re not even on the same intellectual plane! Everybody knew that there was no way that Machine Learning, as effective as it might have been for controlling our manufacturing plants and vehicles, would ever lead to a way for computers to work with language, the highest expression of human intellect.

Indeed, in my lecture nearly twenty years ago, I dared to make the bold claim that in my opinion, finding a way to deal effectively with human language would be equivalent to solving the problem of Artificial Intelligence itself; because natural language, with all its complexities and subtleties, was the very embodiment of human intelligence. A computer could never “process” natural language if it did not truly understand it, I patiently explained to my audience that day in 1992.

And I was dead wrong.

The Rise of Machine Learning

As the years passed, expert systems and the Semantic Web began to run into problems. Progress seemed frustratingly slow. It seemed to take forever to create those dictionaries of meanings by experts. One well-known dictionary of medical terminology (called SNOMED) ended up with over 370 thousand names at last count, and they’re still not sure whether they have separate terms in there that really mean the same thing. Over time, many began to voice the opinion that the dream of the Semantic Web has not been realized and may never be realizable. And with that looming failure, the dream of being able to process human language seemed to fade.

But then something strange happened. Machine Learning began to have a string of successes in a most unexpected area: processing human language.

How could that be? What could have possibly happened to turn Machine Learning – that simple, primitive technique used mostly for teaching motor control to mechanical contraptions that walked or flew about – into a successful approach to dealing with language, that most sophisticated and subtle of all human characteristics that defines our very humanity? What had changed to make it all possible when it hadn’t been possible before? What new and deep algorithms were discovered to capture the essence of language acquisition? Well, it turned out that it wasn’t really the algorithms that enabled the breakthrough.

It was the data.

The explosion of online data

One reason that Machine Learning had been so successful with teaching motor control to robots, helicopters, and the like was that they could have as many examples as they needed to learn from. Did the helicopter crash? Send it up again. Did the robot make the wrong move? Have it try again … and again … and again … thousands of times if necessary, until it finally gets it right.

But human language is a different story. While it’s true that humans acquire much of their language skill from examples, it’s also true that they also have seemingly infinite resources for obtaining their examples – reading, talking, listening, studying, every day of their lives. Even if we thought Machine Learning of language through examples might be a feasible approach, where would all those examples come from?

In 1992, when I gave that lecture, the question was reasonable. The total amount of written language in electronic form was infinitesimal. Hardly anybody except software people like me used computers to write things. The Internet was small. Few people had e-mail. The number of websites was, literally, almost zero. The biggest stores of information in electronic form were databases full of numbers, not words – and even they weren’t very large, all things considered.

But just a few years later, in the mid-nineties, things began to change quickly. With the arrival of the first web browsers like Mosaic, the number of websites began to shoot straight up. Today it is estimated that there may be somewhere around 150 million websites. And the size of the Internet itself? A recent study by IDC estimated that the Internet consists of around 1.2 zettabytes. That’s 1.2 with twenty zeroes after it. It would take the entire population of the world sending SMS messages continuously for the next century to produce that much data.

Today we can find nearly anything in electronic form. The Wikipedia is by far the largest encyclopedia ever written, with over 3.5 million articles in the English language edition alone – and it’s all right there online, available to anyone (and any computer) who wants it. There are books, journals, newspapers, dictionaries … the list is seemingly endless. Suddenly humans don’t have the monopoly on resources from which to draw examples for learning language. Computers have as many or even more – thousands, millions, even billions of examples drawn from the huge pool of information that is now in electronic form. But of course the real question is: does it make a difference?

It turns out that it does make a difference. Peter Norvig, the Chief Technical Officer at Google (and therefore someone who should know what he’s talking about) has put it this way:

In the modern day, [with Machine Learning, we] observe how words and combinations of words are used, and from that build computer models of what the phrases mean. This approach is hopeless with a small amount of data, but somewhere in the range of millions or billions of examples, we pass a threshold, and the hopeless suddenly becomes effective, and computer models sometimes meet or exceed human performance.

It turned out that, in a way, Machine Learning of human language through examples was a bit like Einstein’s Theory of Relativity. Relativity seemed impossible when it was introduced partly because nobody had ever had the experience of traveling at speeds near the velocity of light (except in the car of a certain Welsh friend). Similarly, the idea of a computer acquiring any kind of useful language proficiency through examples originally seemed impossible partly because nobody had ever had the experience of having millions or billions of examples to work with.

The irony is that in the end, Machine Learning turned out to work much more like the way in which we ourselves learn language than the expert-system approach does: in our everyday lives, we also observe how words and combinations of words are used, and from that deduce what the phrases mean. When I have just learned a new word in Italian, for instance, it doesn’t mean much to me – it seems bereft of “semantics”. It’s just another arrangement of letters. But every time I hear it used by somebody or read it in a newspaper or a book, its meaning deepens for me, until eventually I can nearly feel its meaning viscerally as I speak it. I am literally learning through examples.

With the explosion in the amount of information in electronic form in the 1990s, Machine Learning started to replace the expert system approach to processing natural language. The learning process, or “training,” was accomplished through large corpora. A corpus (that’s the single of corpora) is a set of electronic documents full of examples with the correct answers marked so that the machine can learn from them. A huge number of such corpora have become available in recent years. An example is the Open American National Corpus, which contains 15 million words of American English. It is all nicely annotated for parts of speech, verb stems, and the like, and is available to any person (or computer) for study. Another example is the British National Corpus, which contains 100 million words with samples of both written and spoken English from many sources. Yet another example is the Corpus of Contemporary American English (COCA), which contains over 410 million words. Using this ever-increasing pool of corpora for training examples, Machine Learning programs just got better and better and better.

As one success after another was obtained with this approach, companies began to realize the vast commercial potential behind having a computer that could process human language. One of those companies was IBM, which had created a so-called Semantic Analysis and Integration department to track developments in this area.

IBM Watson and the Jeopardy Challenge

The destinies of the Jeopardy game show and IBM came together one evening in 2004 at a restaurant in which an IBM research manager, seeing an especially exciting round of the show being played on television, decided that this should become the next Grand Challenge. The next several years were spent in building a powerful computer for natural language processing based on Machine Learning, which they christened “Watson” – after IBM’s founder.

They fed Watson with every type of resource imaginable: literary works, dictionaries, news articles, databases, anything they could get their hands on – and they could get their hands on a lot, as I noted earlier. Naturally, they fed it the entire Wikipedia. By the time they were done, Watson had ingested 200 million pages of information.

It took a while to get off the ground. In its first tests in 2006, its performance was lousy. But just two years later it was holding its own against human Jeopardy contestants, and by 2010 Watson was ready for prime time. Arrangements had been made for Watson to compete on the Jeopardy show against the two most successful contestants of all time. Both IBM and Jeopardy executives realized the huge marketing value of this occasion and played it up with lots of publicity.

Then came the big day, less than two months ago [February 14, 2011]. As a computer filling an entire room, Watson couldn’t exactly stand up there at a podium next to the human contestants, so they gave him a snappy looking avatar that lit up as he “thought”. Since the game involved pressing a button when you thought you could respond correctly, he was also outfitted with a mechanical finger.

During the match, we were given an insight into how Watson works: during each answer/question session, the viewer would see the three topmost candidate answers that Watson was considering. For each answer, a meter showed the probability Watson was calculating that the answer might be the correct one. The one with the highest probability and also passed a threshold was selected. Otherwise, Watson just shut up and let the others have their chance.

The point here is that Watson wasn’t “thinking” in the same sense that we humans think: he was just calculating, like other computers do. That’s how this kind of Machine Learning works: it’s all based on estimating probabilities of things being right. In fact, it’s generally called statistical Machine Learning for that reason. I’ll get back to that later.

I’m sure you’re all dying to find out what happened, so I won’t keep you waiting: Watson won handily, winning over three times as much as each of the others.

The Aftermath

So what happened after Watson’s successful debut on Jeopardy? Certainly nobody claimed he had cheated, like Kasparov had done 14 years earlier with his predecessor. There were a few gripes here and there, like complaints about his speedy mechanical finger beating the others to the buzzer, but overall there was general acknowledgement that he had won, fair and square.

But what everybody did seem to wonder was, “What does this all mean?” There was article after article in nationally syndicated newspapers, and interviews with analysts on TV talk shows, discussing whether this all meant that intelligent machines were about to take over the world. A month ago [28 February 2011] the New York Times carried an invited article on this subject by none other than Hubert Dreyfus, the philosopher from the University of California at Berkeley whom I had discussed at length in my lecture nearly twenty years earlier on the same subject. Here is what he had to say back then:

Great artists have always sensed the truth, stubbornly denied by both philosophers and technologists, that the basis of human intelligence cannot be isolated and explicitly understood.

His essential argument involved the fact that humans have bodies and therefore a context in the world around us, whereas computers don’t. Here is an excerpt from what he had to say in the New York Times article, where he was analyzing the reason that Watson failed to understand the relevance of a subtle clue during one of the sessions:

… Watson doesn’t understand relevance at all. It only measures statistical frequencies. The fact is, things are relevant for human beings because at root we are beings for whom things matter. Relevance and mattering are two sides of the same coin. As [the philosopher] Haugeland [has] said, “The problem with computers is that they just don’t give a damn.”

That didn’t impress one of the readers of the article, who commented that

… many of our politicians, criminals and military people [also] lack understanding of human beings and seemingly have no compassion for humans.

In summary, nearly all of the philosophers, analysts, and commentators agreed that Watson didn’t represent a revolution in Artificial Intelligence. Yet they also universally agreed that it had accomplished something very impressive. I can think of no better way to put this into the proper perspective than to go back once again to something I discussed in my lecture of nearly twenty years ago.

Artificial intelligence has sometimes been defined as the use of computers to solve problems that previously could only be solved by applying human intelligence. Now, the problem is that this definition has a sliding meaning, as the computer scientist David Parnas has noted. In the Middle Ages, it was thought that arithmetic required intelligence. Now we realize that it is simply a mechanical process, and we've built pocket calculators to do it for us.

In the 1990s chess was only the most recent of those tasks we thought were in the exclusive domain of the human mind. Likewise, we had never thought that a computer could do anything useful with language by mere mechanical computation – surely that was an exclusively human activity. And yet, Watson (and many other systems like it) had shown that it was possible. Watson had literally and figuratively beat humans at their own game.

What’s Next?

So just what are the possibilities opened up by the conquest of natural language by computers? We can start the discussion with Watson himself, which is a so-called question-answering system: you ask it a question and it gives you an answer (except in Jeopardy, of course, where you ask it an answer and it gives you a question).

I presented one of the very first question-answering programs during my lecture back in 1992. That program was called Eliza (after the character in My Fair Lady), and imitated a psychoanalyst. Lucia Ghiselli and I together read the script of one of Eliza’s most famous conversations (Eliza eventually decided that Lucia had a problem with her boyfriend; she probably got into trouble at home that evening).

But there are many, many much more serious applications for question-answering systems, and that is one reason IBM created a Grand Challenge for itself in that area. After the Jeopardy match, IBM began a campaign to publicize the great future Watson was going to have. And the first application they put up in lights was the very application we talked about earlier: an expert medical diagnosis system (those poor expert systems people must be gnashing their teeth at seeing those primitive Machine Learning people succeeding in just the area they were working on much earlier).

Here’s how Jennifer Chu-Carroll, an IBM researcher on the Watson project, put it in an interview with Computerworld magazine:

Think of some version of Watson being a physician’s assistant. In its spare time, Watson can read all the latest medical journals and get updated. Then it can go with the doctor into exam rooms and listen in as patients tell doctors about their symptoms. It can start coming up with hypotheses about what ails the patient.

(I wonder what else Watson does “in its spare time” ... dream of electric sheep?)

A well-read Watson-like system could answer questions and make recommendations in a number of fields. You may not be surprised to hear that the financial industry is looking into the idea of using Watson-like programs to get good stock tips. And of course the military is always interested for lots of applications they prefer not to talk about (and maybe we’d prefer not to know about). The company Comprendo, right here in Pisa, has created a question-answer system that is being used by the Italian telephone companies to answer questions by customers about their services, saving a lot of time for harried call center operators.

Coping with the Vastness

Question-answering isn’t the only opportunity for computer language processing. The possibilities are as vast as the Internet itself. In fact, in large measure they are vast because of the vastness of the Internet.

I mentioned earlier that the size of the Internet today is estimated at about 1.2 zettabytes. But incredible as that number seems, it’s just the beginning: by 2020 the size of the Internet is predicted to be over 35 zettabyes. But most importantly (and relevant to this talk), over 70% of that data is being generated by individuals like you and me, with our social networks, blogs, tweets, and e-mails. And most of that is plain old text written in English, Chinese, Italian, and all of the other human languages around the world.

To put it simply, the Internet has gotten so large that it is beginning to be beyond our capabilities to manage it. Much of this is our fault, of course: we regularly receive messages (especially jokes) that we forward to all our friends, duplicating them many times over; we chat endlessly on the social networks and forums; one of my brothers once proudly told me that he had never deleted a single e-mail message. The result of all this is that nobody can read – much less find – everything that he needs to any more. We need help. And computers that can process language can provide that help.

With the advent of natural language processing, a new discipline known as text analytics has emerged. Text analytics help us sort through all the masses of documents, messages, words, phrases – everything that’s written down out there in the Internet – to help us do whatever we’re trying to do. I’d like to present a couple of examples now of what we might be trying to do.

What are they saying about me?

Those of you who have heard my last four or five lectures will remember that they have always somehow, some way, involved the social networks. That is the measure of how central they have become in our lives. I know people (and I’ll bet you do, too) who track their entire day on Facebook or Twitter or some forum, discussing everything you could possibly imagine: what they had for breakfast; the beautiful new Gucci shoes they just bought; how their sleeping pills made them groggy; what a great job the President is doing; what a terrible job the President is doing. You name it, and you can be sure that it’s being discussed out there on the social networks.

And that makes the social networks a powerful force in every aspect of life now – including business life. Consider the case of Canadian musician David Carroll and his band, the Sons of Maxwell. At a stopover in 2008 at the Chicago airport on a United Airlines flight from Canada down to Nebraska, he looked out the window to see the baggage handlers tossing guitars – one of which was his – onto the tarmac like they were sacks of potatoes. Sure enough, his 3500 dollar Taylor guitar turned out to have a broken neck when he arrived in Nebraska. (Now, there is at least one professional guitarist in our association, and several of us are amateur guitarists, so we can all appreciate this. In fact, many years ago on a flight between New York and San Francisco, my own guitar arrived with a hole in the back, punctured by something that went right through the case. I played it again just a few weeks ago [early March 2011] – that hole is still there.)

You can imagine the wall of indifference Carroll faced when he tried to complain to United about it. So he decided to take matters into his own hands. He made a music video about his experience, composing a clever little tune and lyrics just for the occasion, entitled “United Breaks Guitars.” He uploaded the video to YouTube and sat back to see what happened.

What happened was that it was viewed by 150 thousand people in one single day. Three days later that had risen to half a million. Just over a month later it had been seen by 5 million people. In short: the video had gone viral.

As you can imagine, it was a public relations disaster for United Airlines, and they rushed to make amends as quickly as possible. As for Carroll, he got a great boost from the incident, too, and became a sought-after speaker on customer service. (On one of those speaking trips in 2009, United managed to lose his luggage. One does wonder …)

The moral of the story is that, for businesses, much of what is known as “customer relations” has migrated onto the social networks and you had better be aware of what people are saying about you out there. But there’s just too much of it. Nobody can sit down and read everything they’re saying about you, much less even find it. That is where something called sentiment analysis comes in.

Nowadays it seems that everybody has an opinion and is happy to express it in a very public way. As soon as people do anything, they get out there on their favorite social network and write a review. You can see reader reviews of books on; traveler reviews of hotels and restaurants on; reviews of products of all kinds on specialized forums all over the Internet. Sentiment analysis systems read texts like these reviews, or complaints, or whatever might be fed to them, and they try to determine whether something positive or negative is being said. As you can imagine, they won’t always get it right, because language can sometimes be convoluted and idiomatic (when Michael Jackson called something “bad” he was probably telling you that it was good). But these systems can still read a lot more than you can, and get a sense of whether things are going well or not. If you’re anybody whose reputation is a key part of his business (such as the fashion industry), then you have to get out there and protect your brand name in the blogosphere.


Protecting one’s reputation for making nice clothes and serving good food isn’t the only motivation people might have for sifting through lots of electronic information; another pretty good motivation is staying out of jail. In lawsuits, there is a task known as “discovery,” where lawyers look for documents that are considered relevant to the case. Now, this was already a rather boring and frustrating job back in the old days when most documents were on paper. But now, when most documents are in electronic format, in every conceivable form from text files to spreadsheets to e-mails, it’s often downright impossible. There’s just too much out there. So-called “e-discovery” systems are out there now, reading the documents and deciding whether they’re relevant to a case. And those programs don’t get tired and bored like we do.

It’s not just lawyers who have to worry. Pharmaceutical companies have a legal obligation to be aware of complaints of people who have used their products and had problems (like, say, a sleeping pill making you unusually groggy). If somebody has made that complaint publicly on a social network, then it is definitely in your best interests to be aware of it or you will find yourself with a lot of explaining to do after an incident occurs. But here, too, the job has become simply too big for people to do by themselves.

Text analytics systems are able not only to sift through masses of electronic documentation, but they’re also able now to do some pretty clever sleuthing with what they find. They can read through piles of seemingly unrelated newspaper articles and piece together indirect or even intentionally hidden relationships among persons of interest (for example, two people might be trying to hush up the fact that one of them has a financial interest in the other’s company.) They can look for suspicious patterns that might indicate there is some funny business going on (for example, if somebody closes a bank account in one country and opens another one in a different country on the same day, it may indicate some financial hocus-pocus that deserves a closer look by the authorities). That’s another area in which Comprendo, the company from Pisa I mentioned earlier, has been involved.

This is all very impressive, but it’s also rather worrisome from one particular point of view: human employment. The tasks I just mentioned have been performed up to now by highly qualified – and well paid – humans. As much as we had gotten used to the idea of computers automating menial tasks like adding numbers and assembling automobiles, nobody ever expected them to encroach into this area. Here’s what Tom Mitchell, chairman of the Machine Learning department at Carnegie-Mellon, had to say about it in the New York Times a few weeks ago [4 March 2011]:

The economic impact will be huge. We’re at the beginning of a 10-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.

Who would have ever thought it …


I certainly wouldn’t have thought it back then in 1992 when I boldly stated that mastering computer understanding of human language would be equivalent to solving the entire problem of artificial intelligence. But my mistake wasn’t in making that bold statement – in fact, most researchers still think that statement is true. (The official way to express it is to say that computer language understanding is “AI-Complete”). No, my mistake was in thinking that the problem of computer language understanding had to be completely solved before anything useful could be done. It turns out that a lot of useful things can be done by computers with language without having to fully understand its meaning.

Sure, that means that the Holy Grail of perfect translation of idioms, puns, and poetry is still beyond the reach of computers – that task requires full understanding and may never be fully realized. But that’s not the right perspective on what is happening here. Rather, it reflects the longstanding tension between two different approaches to the relationship between human beings and computers: “AI versus IA.” The first, Artificial Intelligence, was championed by pioneers such as John McCarthy, and focused on building machines that think. The second, Intelligence Augmentation, was championed by other pioneers such as Douglas Engelbart (who, among much else, gave us the computer mouse), who focused on building machines that help people think. In the “IA” approach, it doesn’t matter whether Watson really understands what he’s doing; all that matters is whether he’s doing something that is useful to us. In this approach, computers will always be our assistants, not our masters. Of course, it’s unnerving to see how much our assistants can do now – but they remain our assistants nonetheless.


After writing those last words, I decided to reassure myself by going back to the phrase that provided the title for this talk, and seeing how well a computer translation program would do today. Nowadays that’s easy to arrange, because some of the very best programs are right there online and freely available for all to try out. Google Translate is an example of a modern translation system that uses the same “statistical machine learning” techniques that I have presented and discussed during this talk. So I fired it up, set the “input language” to English and the “output language” to Italian, and I fed it the phrase:

The spirit is willing but the flesh is weak.

In return I received:

Lo spirito è pronto ma la carne è debole.

Oh my … that’s not bad at all …


YouTube: you can find all the television sessions of the IBM Watson Jeopardy Challenge online at YouTube. You can also find an entire course on Machine Learning courtesy of Stanford University, although it is not for the faint of heart.

Stephen Baker, Final Jeopardy. The author followed the IBM Watson team around as it prepared for its Jeopardy Challenge, then wrote a book about it.

IDC, 2010 Digital Universe Study. Estimates and ruminations on the size of the Internet, now and future.

Comprendo ( is a local company just down the road in Pisa that does advanced applications in many areas of text analytics, including some of those mentioned in this talk. Their website has examples you can try out.

Stephen Marsland, Machine Learning: An Algorithmic Perspective. A textbook for those who want to know more about machine learning.