I can see in my mind a Hollywood movie where a junior employee (a junior actress in a cameo role) runs to the shop floor to her cigar-chomping boss (a well-known character actor in the latter part of his illustrious career), shows him a chart that clearly demonstrates a fatal flaw in the factory operations, he takes decisive actions to fix the situation, and together they take over the world of widget manufacturers. This, of course, never happens. Somehow Hollywood hasn’t captured the excitement and novelty of gathering useful data, massaging it into a format that clearly demonstrates some useful principle, and watching the data year after year, catching problems when they occur and highlighting when things go right. Well at least they gave us Big Bang Theory – I never dreamt that physics would be immortalized on the small screen, even if the show is mostly about Penny and Sheldon – specifically Sheldon’s many, many odd and idiosyncratic tics.
In my previous post I introduced GQM – Goal Question Metrics, which takes us up to the point of accepting which metrics you will measure, but doesn’t address the details of what to do with the metrics. I’ll talk about some of those details here. Note that this post addresses charts and reports in addition to metrics. I lump reports, charts and metrics together in my mind not because they are in any way identical but because they shed light on your operations in a way that is not possible by looking at raw data. I’m a big fan of raw data, but there are only so many hours in a day, and looking at raw data takes time. Sometimes you need to push aside the day-to-day management of your operations so you can concentrate on larger, strategic issues. Like systems architecture, or portfolio management, or aligning IT with business strategy. Frankly those are much more fun anyway. If you can reduce the statement of your operations to a few simple numbers, perhaps with drill-down capability in case you want to pursue data that seem odd, then you can focus on longer term trends and make better decisions.
For convenience I will lump the terms metrics, reports and charts into a single term – data summaries. They are important during two operational phases: In progress, and during retrospective reviews. While projects are in progress, or in general on a daily basis, data summaries help signal problems as they emerge. For example if you have twenty testers and normally one or two are behind schedule, then you will want to be notified somehow when 4 or more testers are behind schedule. Retrospective reviews or post-mortems help you understand historical data to determine what went right and what went wrong, to see whether the data summaries need to be improved in some way, and to see what you can do better in general. It is very important to be able to regenerate a data summary trend based on historical data. A retrospective review must result in specific recommendations based on observables you will be making in the future, not based on observables you no longer make because you’ve decided the data aren’t useful for some reason. Thus you need to know what the historical trends of the data would have been had you been collecting it even if you did not collect it.
The ideal output of a set of data summaries is a signal, or event, indicating that a specific intervention is required. This never happens. Forget about the red flag. The more likely scenario is when a few events occur over some time period that, combined with other observations, indicate that some sort of intervention is required. It’s often not clear what that intervention should be. Especially in a knowledge-based industry, you’re dealing with people. Turn the lights up, productivity goes up. Turn the lights down, productivity goes up. Go figure. But sometimes in a knowledge economy any interaction might do some good, especially if you aren’t showing your employees enough love.
Speaking of economies, a typical organization is not large enough that you can start applying macro-economic principles to its operations. Economists use leading indicators to point to where the economy is going to be in a few months, current indicators to indicate where the economy is currently, and lagging indicators to confirm the current indicators. They base their indicators on the operations of millions of individual agents – that is, you and me. Working with such large numbers allows them to create models of the economy that can predict recessions, for example. Except for the current one – nobody predicted that. Nobody predicted the previous one either. Or the one before that. How odd. But hey – they get paid a lot more than I do, so they must be doing something right! It just isn’t predictions.
The reason that macro-economic predictions are difficult to make is because the number of independent variables is very large. This is the way to think about independent variables. Let’s say I want to predict the value of some function f, which depends on a single variable x. Let’s say also that I can make 10 measurements of x and the corresponding value f(x). Here’s a picture of how it might look:
If somebody asks me what the value of the function would be if x were equal to 7.5, I’d have a pretty good idea what the answer is. Now, let’s imagine that the function f depends on two variables x and y, and somebody asked me to interpolate the value for f when x is 7.5 and y is 3.08. This time, for every value of x you’d want to measure 10 values of y to get enough data to be able to reliably make the prediction. Ten x’s time ten y’s yields 100 datapoints. Thus when you have two independent variables, the number of data points required to do a proper prediction goes as the square of the number that you would need for a single variable. If you take it to three dimensions then the number of measurements you need is one thousand, or 10 cubed. In four dimensions, you need ten thousand. These numbers are of course illustrative – the real numbers depend on the exact nature of the observables.
Note that independence is important. If x and y are highly correlated – that is to say, if when you measure x you have a pretty good idea of what y will be – then you can probably get rid of y and measure only x. Mathematicians use something called principal component analysis to reduce the dimensionality of the system down to the smallest number.
As the number of dimensions increases, the amount of data you need to make accurate predictions goes up extremely rapidly. Unfortunately real systems have many independent dimensions, including projects or the operations of a corporate division. They are highly complex. And therefore you need a lot of data. Data that frankly you don’t have, even if you were able to reduce it down to the minimum via principal component analysis.
So, is there any hope for metrics? Yes, absolutely. You just have to scale down your expectations. In coming posts I will discuss the kinds of information you might glean from the data.

[...] Metrics and macroeconomics [...]