This essay is adapted from “Chapter 2: Defining Security Metrics” of my forthcoming book, Security Metrics: Replacing Fear, Uncertainty and Doubt from Addison-Wesley and Symantec Press, expected in early 2007. Small portions of this appeared in “The Future Belongs to the Quants,” an IEEE article co-authored by me, Dan Geer and Kevin Soo Hoo.
Information security is one of the few management disciplines that has yet to submit itself to serious analytic scrutiny. In security, business leaders ask:
- How secure am I?
- Am I better off than I was this time last year?
- How do I compare with my peers?
- Am I spending the right amount of money?
- What are my risk transfer options?
Were we talking about some other field, we could look to prior art and industry-specific knowledge — for example, derivatives pricing in vertical industries like finance, health and safety in pharmaceutical manufacture, and reliability in power distribution. Likewise, most enterprises’ horizontal functions — human resources, finance, manufacturing, supply chain, call center, e-commerce and operations — measure their performance by tracking a handful of key performance indicators. These indicators include statistics such as call volumes per associate, inventory turns, customer conversion percentages, manufacturing defect rates and employee turnover.
|Discipline or Vertical Market
|Freight cost per mile
|Cost per square foot
|Website conversion rate
|Cable and satellite
These indicators all share two characteristics. First, they are simple to explain and straightforward to calculate. Their transparency facilitates adoption by management. The second characteristic these indicators share is that they readily lend themselves to benchmarking.
On occasion, enterprises will share them as part of a management consulting survey, and will attempt to compare their own key indicators against those of other companies they know. In so doing, they gain insights about their own performance relative to peers and other industries. A quick glance at the Harvard Business Review or McKinsey Quarterly confirms that benchmarking in enterprises continues to be a healthy, vibrant, established pillar of modern management.
Information security has no equivalent of McKinsey Quarterly, nor of the time-honored tradition of benchmarking organizational performance. Analytical rigor receives little attention, while nebulous, non-quantitative mantras rule: “defense in depth”, “security is a process” and “there is no security by obscurity” to name a few. What numbers do exist, such as those provided in vulnerability and threat reports from Symantec, Webroot, Qualys and others, provide macro-level detail about the prevalence of malware but little else that enterprises can use to assess their effectiveness comparatively against others. Numbers provided by anti-malware, vulnerability management systems, and SIM/SEM systems certainly add value — but to date, no entity has yet attempted to aggregate and compare these data across enterprises.
So what makes a good metric, and what should we measure? Let’s address the first part of that question in this post — we will address the second in the subsequent one.
I was curious to see if I could find a consensus definition of what a “metric” is. According to Oxford’s American Dictionary, a metric is “a system or standard of measurement.” In mathematics and physics, it is “a binary function of a topological space that gives, for any two points of the space, a value equal to the distance between them, or to a value treated as analogous to distance for the purpose of analysis.”
Specific to IT metrics, Maizlitsh and Handler further discriminate between metrics used for quantifying value versus those used to measure performance:
There are two fundamental types of metrics that must be considered before commencing with IT portfolio management: value delivery and process improvement. Value delivery consists of cost reduction, increase in revenue, increase in productivity, reduction of cycle time, and reduction in downside risk. Process improvement refers to improvements in the IT portfolio management process. While the metrics are similar and in many ways interrelated, process metrics focus on… effectiveness. Is the process improving? Is the process providing perceived value? Is the process expanding in scope? More and more, leaders are looking into the metrics microscope to eliminate non-value-added activity and focus on value-added activity.
—B. Maizlitsh and R. Handler, “IT Portfolio Management: Step By Step”, p.53.
These two definitions certainly help, but like most definitions it grants us a rather wide scope for discussion. Just about anything that quantifies a problem space and results in a value could be considered a metric. Perhaps we ought to re-focus the discussion on the goals of what a metric should help an organization do. The primary goal of metrics is to quantify data, thus yielding insight. Metrics do this by:
- Helping an analyst diagnose a particular subject area, or understand its performance
- Quantifying particular characteristics of the chosen subject area
- Facilitating “before-and-after,” “what-if” and “why/why not” inquiries
- Focusing discussion about the metrics themselves on causes, means and outcomes rather than on methodologies used to derive them
As an analyst, I’m keenly interested in making sure that persons examining a “metric” for the first time should see it for what it is — a standard of measurement — rather than as something confusing that prompts a dissection of the measurer’s methods.
Metrics suffer when readers perceive them to be vague. For example, I have seen a widely publicized paper that proposes benchmark security effectiveness, but in its key graphical exhibit, the author’s metric is described only as a “benchmark” with a scale from 1 to 5; it does not contain a unit of measure or further explanation. The authors undoubtedly intended to spark discussion around the causes and drivers for the metric — but the exhibit instead makes readers scratch their heads about what the metric is and how it was defined.
To keep organizations from trapping themselves in tar-pits of hand-wavery and vagueness, metrics should be clear and unambiguous. Specifically, good metrics should be consistently measured, cheap to gather, be expressed as a number or percentage, and expressed using at least one unit of measure. A “good metric” should also ideally be contextually specific.
Metrics confer credibility when they can be measured in a consistent way. Different people should be able to apply the method to the same data set and come up with equivalent answers. “Metrics” that depend on the subjective judgments of those ever-so-reliable creatures, humans, aren’t metrics at all. They’re ratings. The litmus test is simple: if you asked two different persons the same measurement question, would they produce the same answer?
Metrics will either be computed by hand or by machine. In the former case, one can ensure consistency by documenting the measurement process in a transparent and clear way. When people understand how and why to do something, they tend to do it in a more consistent fashion. Keeping measurement questions short and factual (yes/no oriented) helps, too.
Even better than manual data sources, however, are automated ones. One programmed, machines will faithfully execute their instructions as provided by their programmers. They will execute their tasks the same way each time, without mistakes.
Cheap to gather
Every metric takes time to compute. All metrics start their lives as raw source data, then— through the magic of computation — become something more insightful. That means that somebody or something needs to obtain the data from a source, massage and transform the data as needed, and compute and format the results. For some metrics, these steps collapse into a single, fast process; a simple SQL statement or API method call delivers the goods. But other metrics require screen-scraping, phone calls, and spreadsheet hackery. Inefficient methods of gathering data cost organizations valuable time that they could have put to better use on analysis.
I firmly believe that metrics ought to be computed often. Metrics with short sampling intervals help companies analyze their security effectiveness on a day-to-day and week-to-week basis rather than through a yearly rear-view mirror. It stands to reason that if a metric needs to be frequently computed, the source data for the metric should be cheap to gather in terms of time or money.
Before-and-after comparisons aren’t something organizations should be forced to do once a year because of inefficient data gathering. For a given metric, ask yourself: could you compute it once a week? How about every day? If not, you might want to re-consider the metric — or consider methods of speeding up the measurement process. As with the point about consistency, the criterion that good metrics ought to be cheap to gather favors automation.
Expressed as a number or percentage
Good metrics should be expressed as a number or percentage. By “expressed as a number,” I mean a cardinal number — something that counts how many of something there are — rather than an ordinal number that denotes which position that something is in.
For example, “number of application security defects” evaluates to a cardinal number that can be counted. By contrast, high-medium-low ratings that evaluate to 1, 2 and 3 are ordinal numbers that grade relative performance scores but don’t count anything.
Metrics that aren’t expressed as numbers don’t qualify as good metrics. “Traffic lights” (red-yellow-green) are not metrics at all. They contain neither a unit of measure nor a numerical scale.
Expressed using at least one unit of measure
Good metrics should evaluate to a number. They should also contain at least one associated unit of measure that characterizes the things being counted. For example, the metric “number of application security defects” expresses one unit of measure, namely defects. By using a unit of measure, the analyst knows how to consistently express results of a measurement process that looks for defects.
My definition of a good metric holds that it’s often better to use more than a single unit of measure. The single unit of measure for “number of application security defects” metric makes it hard to compare dissimilar applications on an apples-to-apples basis. But if one unit of measure is good, two is better. For example, a better metric might be “number of application security defects per 1000 lines of code,” which provides two units of measure. By incorporating a second dimension (dividing by KLOC), we have constructed a metric that can be used for benchmarking.
Good metrics mean something to the persons looking at them: they shed light on an underperforming part of the infrastructure under their control, chronicle continuous improvement or demonstrate the value their people and processes bring to the organization. Although specificity is not required for all good metrics, it helps to keep each of them scoped in such a way that a reader could receive enough insight to make decisions based on the results.
“Contextually specific” is a shorthand way of saying that a good metric ought to pass the “smell test.” You don’t want managers wrinkling their noses and asking belligerent questions like “and this helps me exactly… how?”
For example, defining an “average number of attacks” metric for an entire organization doesn’t help anybody do their jobs better — unless the indirect goal is an increased security budget. But scoping the same metric down to the level of a particular business unit’s e-commerce servers can help much more, because they can make specific decisions about security provisioning and staffing for these servers based on the data.
Well, that’s it for this post. Next time, we will consider what makes a bad metric. I reserve special venom for ALE, so you won’t want to miss it.