|
Close Help |
In this article I present a simple formula for risk quantification and provide in detail why I think the formula is badly abused. The problem is not really with the formula but with the data that goes into the formula. Risk data is always backward looking yet everyone talks about the risks of future events. This is a common problem which is made worse by the misuse of data in such a simple formula.
While I provide heavy criticism of the formula I also outline why this does not prevent business from progressing, why risk quantification is not necessarily a waste of time, and why jobs are not necessarily lost from poor risk quantification.
Defining Risk Quantification
Risk is typically quantified as:
Impact is actually the cost of an impact in dollars. If there is a breach then what is the expect cost of the damage. This could be the replacement cost or perhaps the increased cost in insurance premiums or whatever else is involved in repairing the damage. It would also be fair to quantify the loss of earnings as a result of the impact. The point is that the aggregate cost of an events, or events, is grouped together in a logical way to estimate the total cost.
Probability can be defined in a number of ways: most of which are based upon a frequentist approach. Naturally, it only makes sense to talk about probability with respect to a particular timeframe: i.e. the probability of a data breach occurring within a given year.
The most worrying phrase I've heard with respect to this formula is: "if either the impact term or the probability term is zero then the risk is zero." I contested this point and was dismayed by the ease of which the speaker ignored my comment. I will point out in this article why that is dangerous: it is impossible in many cases to determine if either term is zero.
An alternate formulation is Risk = impact * likelihood. The meaning of the two equations is broadly same, but since the first one is easier to talk about then I shall focus upon it. Probability is defined between 0 and 1 while likelihood is uncapped and also more difficult conceptually to understand. We have understanding of likelihood in a vernacular sense but mathematically has a precise meaning which is confusing.
Difficulty in assessing impact
The impact cost is always an estimation. It relies upon working with accurate historical data, assuming that this data will be correct or can be correctly adjusted going forward, and that the data provides a complete picture of the impact. An incomplete data set which has accurate data could lead to a seemingly perfect but incomplete calculation of the impact. Given that future cyber attacks can take advantage of unforeseen security vulnerabilities then it is likely that the damage can affect areas of the business or systems which are seemingly unrelated. Building a complete picture is hard, if not impossible since the attack surface of modern organization is huge. Therefore some margin of error also needs to be added.
It is not obvious that an appropriate margin of error will ever be sufficient, at least not for all circumstances. What happens to the calculations when an event which was deemed to be of a small impact turns out to be a medium or large impact? I'm leaving this terms to be vague but the principle should be clear. If there is an error in the calculation in one particular year then it can be updated in the risk report of the following year. Naively, this suggests that data will become more complete and more accurate year on year. Broadly true, but what happens if an event which is deemed to have a small impact actually turns out to have a catastrophic impact? An event which was deemed to be negligible because it would only affect one system actually affected all systems. As mentioned, the attack surface is large and enumerating all vulnerabilities is essentially impossible.
Clearly what I'm getting at here at the tails events, aka black swan events. Those familiar with the literature will already know the difficulty in accounting for such events. Mitigating the risks is difficult especially if the risk is occurs due to an unforeseen vulnerability. Black swan events dominate the cost of damages yet they are rarely costed for. I would say that black swan events don't necessarily invalidate the impact term in this equation but rather they are difficult to fully account for. In an article I published yesterday I pointed out the problem these events give for calculating the cost of insurance. The scale of the impact is actually incalculable: it is indeterminate. That is not to say that a cost cannot be estimated, but that estimation has to be accepted as inaccurate. All of this is without even discussing intangible costs such as reputation.
Difficulty in calculating probability
The key line of my argument against this formula has been the accuracy in which costs can be determined. A difficult task to get 100% correct especially in the face of black swan events, but I accepted that such events don't necessarily invalidate the impact term of the equation. Those who have read the literature on black swans will already know the conclusion of this section of my article: tail events completely invalidate the probability term. Before I get to that statement I will illustrate why.
One way of calculating probability could be to look at all attacks upon a network within a given year and then determine the distribution (think histogram) of each type of attack. If there was 100 scans (within a given year) of a network but only 1 lead to an intrusion then you could calculate a naive value of 1% (1 in a 100). An alternative would be to look at the number of days where an intrusion occurred and then calculate a value as the number days when an intrusion occurred (1) out of the total number of days in a year (365). The latter leads to a small probability which may lead to an over-estimation of the feeling of security.
Defining probability depends on the perspective of the person doing the calculations. The data is also backward looking since it is describing historical events, yet what we really want is a number that predicts future events. Past data is not necessarily a reliable indicator of the future, but it is essentially the only data we have. This is THE problem with risk calculation in the investment industry.
A simple frequentist approach is the simplest but also inappropriate. If an event is deemed to be improbably because the historical data doesn't show that it ever happened then the probability should be zero. This is strictly correct in the frequentist approach. The person mentioned above gave a reason why probability should be zero. He related it to the story of Ralph Nader opposing convertible cars since it was possible for bowling balls to fall form a bridge and kill someone because a convertible lacked a roof to protect the people inside. Apparently the US government decided not to ban convertible cars because there was no occurrences of bowling balls ever falling from bridges and killing the occupants of a convertible. This conclusion was alarming for me, but the person was smug in believeing it was a great example of why risk could be zero. This troubles me since it is theoretically possible for such an event to occur; it is not blocked by the laws of physics, and notion of a lack of precedence is not a good reason to set the probability to zero. Have their been other objects dropped from bridges which have killed the occupants of a convertible? Were those events also accounted for?
Even if such events have occurred it does not necessarily follow that convertibles should be banned. That isn't really my contention. There are more fatalities from road accidents in the US than from aeroplane accidents, yet cars not banned. Risk does not need to be zero for cars (any car) to be legal. I will also point out that this is an anecdote rather than hard evidence. That said, I have come across such attitudes in many people and I find it troubling.
Let's consider another example where the risk was deemed to be zero: the most obvious example being that of the Ashley Madison website, I mention this as a timely example but one which has received a huge amount of coverage in the media. Their website was supposedly secure. The company claimed the site was secure but media reports suggest that the attacker found a flaw. It isn't entirely clear to me as an outsider to the company if the flaw is real (I haven't tried to replicate either). Let's naively assume it was a website vulnerability that lead to the data breach. The company thought, or at least publicly suggested, that a data breach was impossible. This suggests that they thought the probability was zero. It wouldn't matter what the impact cost would be as the probability is identically zero and hence the risk is exactly zero. All past data "proved" that it was impossible: "we've never had a data breach from a website vulnerability".
This is the bowling ball in the convertible for Ashley Madison. The frequentist approach utterly failed here. Apart from the various ways a frequentist approach can be applied, as mentioned, changing to a Bayesian approach is likely too complicated for many even if it is in principle a more correct method to adopt. Moreover, a Bayesian does not automatically account for black swans: it is a framework that depends on the data used. This is really the point of the risk formula above: it is not necessarily wrong but rather the data which is applied to the equation can make the outcome wrong. In short: garbage in, garbage out.
The black swan literature tells us that the probabilities of black swam events are incalculable. They are not zero, or small. They are indeterminate. The attack surface is never known completely and accurately hence it is impossible to enumerate risks and calculate accordingly. It isn't clear that the data breach at Ashley Madison was due to a website vulnerability. John McAfee has suggested that it was an inside job. He might be right, but how do we assign a probability to such an event? Or even the impact cost? Apart from the loss of reputation, the exact cost of the damage is not fully known. Law suits have been filed but it is not clear if they will succeed. This is a potentially business-ending event for the company; a fatality in which the impact for the company is essentially maximal. This is a risk that all organizations take and it is not clear than any of them are appropriately protected. I say this with some naivety and as an outsider to most organizations.
Is risk quantification a waste of time?
If the probability can't be calculated then why bother? Because risk quantification is fine within the domain of mundane events. Some risks are well behaved and well understood. The impact from an attack is not always the end of a business. In this domain a frequentist approach is perhaps the most efficient use of time. Moreover, probabilities are essentially meaningless: if an event occurs but only had a 10% probability does that validate the calculation? What is the probability had been 15% or even 90%? The values are degenerate / indistinguishable. Again, this is a probable with the investment industry. If a stock market crash has a supposed probability of a 10% but the crash doesn't occur then people can naively believe the probability was correct. The inaccuracy of applying Gaussian statistics to the real world is well documented elsewhere that I would cover that topic here.
If the frequentist approach is incorrect outside of the domain of mundane events then why people working in risk lost their jobs? Two potential reasons: those organizations have been lucky enough not to suffer from a catastrophic event, or that their defense strategies are appropriate robust that they are reasonably well protected from catastrophic events.
The CEO of Ashley Madison has just stepped down which is evidence that the miscalculation of a black swan event can lead to job losses. This has not been true in all data breaches though. Not all CEOs have stepped down following a data breach, and not all companies have had data breaches. The latter doesn't necessarily offer comfort but it is certainly possible that policies and procedures have mitigated most of the damage from a high impact event. Hardening a website is the job of the tech team, but providing the correct policies and procedures is the task of management. Preventing insiders and managing those risks is not a necessarily a technological concern. It is true that technology can help to facilitate the management of the risks posed by humans but it is the policies and procedures which are more important. Even if the risk calculation is completely incorrect having the correct mitigation strategies in place can reduce the impact from unforeseen events. This is why I suspect that risk managers are still in their jobs: not because of the accuracy of their calculations but naively because I think they have adopted the correct policies and procedures.
The quantification has its place in risk management, but the its usefulness is limited. A fuller assessment of risks to account for black swans could be so expensive that it is prohibitive, and is not always necessary if the risk management strategies are robust enough to protect against them.
Succesful Cyber Defense
Is there any proof of what I'm suggeting? Well, I think there is. A recent article published by the Havard Business Review is very timely with the writing of my own article. It points out the the US military has been very successful in defending against cyber attacks: their success is underpinned by great technological capabilities but also from strong risk management on the human side. HBR's article: Cybersecurity’s Human Factor: Lessons from the Pentagon.
"From September 2014 to June 2015 alone, it repelled more than 30 million known malicious attacks at the boundaries of its networks. Of the small number that did get through, fewer than 0.1% compromised systems in any way. Given the sophistication of the military’s cyberadversaries, that record is a significant feat."
My thoughts on this are that it is impressive but also cautious about ignoring that 0.1%. If the military network is incredibly hard then what does it say about the sophistication of the 0.1% that got through? Potentially it is the most sophisticated. That's just a guess. It could be down to human error too, in which case the sophistication could be low. The sense of being secure is somewhat false if that 0.1% that do get through also have the greatest impact. Are these bowling balls into the military's convertible?
Where I would believe, or like to believe, that the military is good is in the area of security policies and procedures. The HBR article at least suggests this and the data is corroborative:
"One key lesson of the military’s experience is that while technical upgrades are important, minimizing human error is even more crucial. Mistakes by network administrators and users—failures to patch vulnerabilities in legacy systems, misconfigured settings, violations of standard procedures—open the door to the overwhelming majority of successful attacks."
It is worth adding that a single layer of protection is not going to provide complete protection. Having good policies or a great anti-virus scanner isn't going to cut. There needs to be multiple forms of protection. As Brian Krebs says, security is all about layers but the human factor is the most important:
Comments |
|
Last Updated (Wednesday, 02 September 2015 20:12)
© 2009 esoteriic.com
All Rights Reserved.
Joomla 1.5 Templates Joomla Web Hosting cushion cut engagement rings Joomla Templates joomla hosting