<< First  < Prev   1   2   Next >  Last >> 
  • 09 Jun 2013 2:53 AM | SIRA Communications

    The SIRA Board of Directors passed a resolution at their most recent meeting to start a new paid membership tier based upon the results of this year’s membership survey and the desire to create the foundation for a more formal professional organization. As a result, we are formally announcing the introduction of the SIRA Professional Membership (SPM).

    The cost for SPM will be $50.00USD. SIRA will maintain current and historical records of your membership in the event you reference SPM on your CV/bio. Benefits to membership include:

    • Reduced registration to SIRAcon ($50.00 off total price)
    • Membership through 2014
    • Proceedings of 2013 SIRAcon
    • SIRAcon journal subscription carried through 2014
    • Free access to SIRA-sponsored webinars

    Future webinars will be restricted to SPM or those paying to attend.

  • 14 Apr 2013 2:59 AM | SIRA Communications

    This blog entry was originally written by Mark Chaplinm (@markachaplin)

    Note: This post is not complete yet, the actual resource list is too long to be posted in a single entry on this website.

    I recently posted a list of IRM resources on the SIRA mailing list, and Bob Rudis asked me to add it as a blog. So here it is (with Anglo-spelling and a couple shameless plugs with collaboration in mind). The list is based on material I have come across over the last couple of years as part of my own personal research activities and my work at the Information Security Forum. I tend to share most resources and links on Twitter as @markachaplin when I come across them and then consolidate later. I am always on the lookout for useful material and contacts (hint).

    The purpose of listing the resources, for me, is to act as a reference for helping in various aspects of information risk management and information security, including:

    setting up an information risk management framework to align with operational risk management (eg as part of ERM), focus at a business process / business environment level, establish supporting material to facilitate effective information risk analysis and shape the information risk analysis methodology (eg communication, decision making and reporting)
    establishing an information risk analysis methodology, following a complete end-to-end information risk analysis process (including preparation, business impact assessment, threat assessment, vulnerability assessment, risk evaluation, risk treatment) and considering the complete lifecycle of information that supports critical business processes
    treating information risks, particularly implementing security controls and arrangements for mitigating risks, such as those associated with policy, privacy, legal and regulatory compliance, application and infrastructure protection, business environments, mobile computing, supply chain, systems development, physical security, business continuity and security audit.
    Those of you who are Members of the ISF will recognise a number of things above.

    The resources listed below are structured around rudimentary categories because I haven’t had time to determine how best they should be grouped. I welcome any suggestions from SIRA members on extending and improving it (eg including more material for other disciplines and from geographical regions other than the usual culprits). Some resources are suited to more than one category and you may find duplicate entries.

    Finally, there are three important points I need to make before you read the list:

    I do not endorse anything on the list - it is purely a collection of material I have come across
    I have not included anything from my employer, but if you are interested in what we do at the Information Security Forum you can get an idea (and some free sample material) at
    I don’t just regurgitate other people’s work. I am also a research analyst and report author (amongst other things) so understand the pain involved in producing quality reports (or equivalent) to help organisations manage information risk effectively.
    I hope you find it useful, and please share any other resources you are aware of. There’s plenty out there.

    Current categories used for the list

    Business Focused Resources (that may influence information risk)
    Risk Management
    Vulnerability / Exploit
    Incidents, Breaches, Compromises…
    Supply Chain Risk Management
    Systems / Software Development
    Security Testing
    Surveys, studies and reports from Vendors
    Surveys, studies and reports from non-Vendors
    Legislation and regulation
    Fraud and Identity Theft
    Vendor Resources
    Practices and controls
    Access Control
    Malware Protection
    CERTs, Bulletins and Mailing Lists

  • 06 Aug 2012 3:07 AM | SIRA Communications

    This blog entry was originally written by Alex Hutton

    So I’ve been working on something for a while, with the intent to have it be a SIRA work of art - available to the community via SIRA for IRAs to use and abuse.

    The idea is relatively simple - take a “Fish” or Ishikawa Diagram for root cause analysis - and apply it to information risk.

    So instead of production/manufacturing’s categories of People, Methods, Machines, Materials and so forth, all I did was apply VERIS categories of incident classification - and added a “Controls” tree.

    You can grab the PDF version, Visio version or OmniGraffle version. I’ve been using it personally for a while, and while it’s not really earth-shattering, perspective-changing, risk model-arama - I have found that it can be really useful, almost a risk analyst’s swiss army knife.

    Please let me know what you think. With this post I give it to you, the Society. If we find it useful - then I hope you’ll encourage others to come to the Society to learn more.

    With that - it’s very 1.0. The control branch especially, I’m not proud of. Other considerations (frequency, strength or amount) aren’t quite there for all the trees. But I’d like and appreciate your help if you want to give it.

    Google Docs version by Brian Livingston

  • 11 Jun 2012 11:32 PM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by John Hoffoss, I am just migrating the post to the new SIRA site.

    Mairtin contacted the board in November of 2011 with a question about sharing his thesis:

    A bit over a year ago I did my MSc thesis on optimizing Information Security Investment, which effectively turned into looking primarily at quantitative risk assessment using the usual FAIR/Monte Carlo type approach. 

    While the conclusions aren’t anything new to people involved in SIRA, I thought it might be a good introduction read for those who are interested in the area but haven’t a clue where to start. 

    I was wondering if you’d be interested in linking to it or including it on the SIRA site? It’s mainly just sitting in my folders doing nothing so if it was helpful to others, I’d be thrilled.

    This is the writeup he created for us that we’re finally getting published here.


    If anyone has been following the SIRA mailing list for the last few months, you have seen some fantastic debate over the approaches to dealing with information security risk. While there is obviously a lot of incredibly talented people on the list, the common information security guy in the trenches may often be scared off by a lack of understanding of what on earth everyone is talking about! So I thought I’d share my brief story of how an information security guy like myself, who was originally more at home with penetration testing and reviewing tcpdump packet captures, ended up in the world of Monte Carlo simulations, aggregated risk and statistics!

    Like most information security professionals, ever since I studied for my CISSP, I’ve read about Annual Loss Expectancy (ALE) and how it can be used to estimate the amount you should be spending on security controls. I subsequently saw references to ALE time and time again in other information security books and exams such as the CISM. As I came from a highly technical background, I just accepted this as management type material that was interesting but wasn’t all that relevant to what I did day-to-day.

    However, as I started moving more towards management positions and started to help client companies (I work in consultancy) with information security management challenges, I started to ask a very simple question. If ALE is being referenced in the majority of books and certificates tailored to information security professionals, why haven’t I ever seen anyone actually use it? That started me thinking, and around five years ago I started digging for books that could help me understand why this was.

    My first area of research was in metrics, particularly kickstarted by reading one book: Security Metrics by Andrew Jaquith. Straight away this book showed me what the key flaws were within ALE, although at the time I had no idea what on earth an “outlier” was! By the way, Andrew highlights the following problems with ALE: the inherent difficulty in modelling outliers, the lack of data for estimating probabilities of occurrence or loss expectancies, and sensitivity of the ALE model to small changes in assumptions. Read the book for more info!

    The next key book for me came the following year and it was probably the most important book I’ve ever read in information security. That book was The New School of Information Security by Adam Shostack and Andrew Stewart. In this book, Adam and Andrew highlighted the need for evidence-based information security decisions and raised ideas around economics, physiology and even sociology and how they apply to information security. Now at this stage I knew I was starting to stray far from my usual comfort zone!

    Following on from the areas of economics of information security highlighted in The New School of Information Security, I started doing more research in this area and came across the great research what Ross Anderson and his team were doing in the University of Cambridge. This started me thinking that perhaps the answer actually lied in more academic research that may not always make it into the mainstream of information security books and magazines. So I started reading, leading me to many very detailed papers that outlined models around security risk and optimising investment written by mathematicians and economics throughout the world’s universities.

    Trying to understand these quickly became almost impossible due to my lack of statistics and deep mathematical background. A shame, but unfortunately I just simply didn’t understand what was going on, and didn’t have the time with work to dedicate to learning yet another new area!

    And that’s where I left it, until I started to think about topics that I would like to write my MSc thesis on when completing my part time MSc in Information Security at Royal Holloway, University of London. Straight away I thought that this was the perfect time for me to spend time exploring this area in more detail, with the support of people who understand the academic area and would be capable of providing me further assistance in how to interpret it all! And that’s exactly what I did.

    My basic objective for the thesis was not necessarily to find any ground-breaking new discoveries, but very simply to compile all the different types of research I could find in the area of optimising information security spending and try to make it understandable to someone with a background like myself; not an economist, not a mathematician, not a professor but a simple information security professional.

    During this journey I came across many hugely interesting books and research that changed my outlook on information risk by people such as Doug Hubbard, Dylan Evans, Sam Savage, Jack Jones and a plethora of academic research by too many people to mention!

    I looked at work done in the areas of risk management, corporate finance, economics and reliability to try and identify how other disciplines are dealing with similar challenges and found that a number of problems existed in the area of optimising information security investment, namely education, concept of return, lack of information, rating systems and ordinal scales, uncertainty and risk appetite.

    Using these as a starter, I then went on to review each of these in order to further explain the problems I saw, and attempt to identify some high level solutions to these problems.

    Now I don’t for a second claim that I’ve identified every possible bit of literature, nor that my analysis is flawless, but what I do hope is that if you’re working in information security and you’re interested in SIRAbut don’t have a clue what people are talking about, then my thesis might help out in giving a bit of background to what these guys are talking about.

    What’s most interesting for me is that when I wrote this thesis around two years ago, SIRA didn’t exist and getting information around of information risk management was difficult to say the least! Now we haveSIRA with daily running discussions on almost everything I’ve covered in my thesis! It’s great to see how far things have advanced in terms of discussion and availability of information in such a short time.

    Now, if only SIRA had existed three years ago before I started so I could have gone a lot further in my thesis!


    Thesis: Optimising Information Security Investment

  • 22 Mar 2012 11:37 PM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by Jeff Lowder (@agilesecurity), I am just migrating the post to the new SIRA site. 

    This is not breaking news, but I’m posting this announcement here just in case interested parties had not already heard the news. As explained on the Department of Energy's website:

    The Department of Energy, in partnership with the Department of Homeland Security, is leading a new White House initiative to create a more comprehensive and consistent approach to protecting the nation’s electric grid against cyber attacks. The Electric Sector Cybersecurity Risk Management Maturity initiative will combine elements from existing cybersecurity efforts to develop a maturity model that allows electric utilities and grid operators to assess their own cyber strengths and weaknesses and prioritize their investments. This initiative is the next logical step in a continued effort by public and private stakeholders to identify steps to improve the cybersecurity of the electric grid and will leverage years of work and lessons learned from both the private and public sector.

    Officials from the Energy Department, the White House and DHS met with leaders in the electric sector, research organizations, industry associations, academia and other government agencies from across the electric sector on January 5, 2012 to launch the initiative and request their expertise and participation in the public-private partnership. Since then, there has been a huge response from industry, with numerous utilities indicating they are interested in offering their expertise in developing and/or piloting the model. For the pilot, we want a group that is representative of the industry so we expect participants to include utilities such as public power companies, ISOs/RTOs, IOUs, and coops. The pilot will be conducted in April, and the model should be available to the electric sector this summer.

    Maturity models begin as works in progress and mature as lessons learned and best practices evolve and the model is refined. We expect to see this model refined over time as the model is used and more lessons learned and best practices are incorporated

    As we saw at the launch of this initiative and have seen in the days since, there is a sense of urgency and willingness in the industry and among our public partners to move forward quickly. We are now capitalizing on that momentum to develop a useful tool that can be used effectively across the entire electric sector.

    As we move forward with the initiative, we will post periodic updates on the Office of Electricity Delivery and Energy Reliability website. If your organization is interested in receiving updates via email, please contact us at

  • 06 Mar 2012 3:20 AM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by Jeff Lowder (@agilesecurity), I am just migrating the post to the new SIRA site. 

    If you’d like to influence the direction of the Information Risk Management (IRM) profession, please consider joining our IRM Body of Knowledge (IRMBOK) working group, which aims to develop an IRMBOK.

    To participate, please join the IRMBOK mailing list, which requires a separate subscription from the main mailing list. To subscribe, please go to the following webpage and follow the instructions there.

  • 04 Mar 2012 3:22 AM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by Jay Jacobs (@jayjacobs), I am just migrating the post to the new SIRA site. 

    With this year’s RSA conference still close in the rear view mirror, I felt I had to write about something that stuck in my mind as I went through the week.  I found repeated confirmation to something Doug Hubbard wrote, “I have never seen a single objection to using risk analysis in any profession that wasn’t based on ignorance of what risk analysis is and what it can do.”  Keep in mind, ignorance isn't meant to be derogatory or insulting.  It simply means a lack of knowledge or uninformed.  Many of the people I heard object to risk analysis were incredibly smart in many ways, yet they were presenting objections from an uninformed position of risk analysis.   It makes me wonder if we have to reduce ignorance about this field before we are able to successfully reduce uncertainty around our exposure to loss.  

    One such objection I encountered was during a conversation I had on the first night I was in San Francisco.  After some lively back-n-forth with a colleague on the efficacy of risk management and some healthy skepticism from this person, I received this challenge: “suppose I go into a casino with $100,000, what am I going to come out with?”  The assumption on his part was that analysis would provide a single number (perhaps an average loss) and whatever the answer was, it’d be wrong.  But my response was simply, “what if I produce a distribution?”  And so, for this challenge I have whipped out the first graph.  It is based on some very specific assumptions though.  I made the assumption that the gambler would chose a (single) game for their visit and they were consistent in their gambling.  Not that it mirrors reality, it just makes this example much easier (and this is just an example).   Screen shot 2012-03-03 at 6.22.46 PM

    Here are the other assumptions:

    • American Roulette (with the “0” and “00”)
    • Bets were one of the red/green, even/odd options (which wins about 47.3% and pays 1 to 1)
    • The gambler played for about 4 hours at a lively pace of 60+ games per hour (for a total of 250 rounds)
    • The gambler consistently wagered $5 per round

    I know these would probably not match reality, but that’s okay.  The assumptions are clear and the analysis could be updated as the assumptions are updated.   Point being, it only matters that the analysis matches the assumptions and that we can update the the assumptions (and the model).  The analysis could be redone with the odds from any game, or the analysis could combine multiple games played during a casino visit.  I’m just trying to keep this simple.

    My answer to the original question would start out with “given the assumptions…” and say something like this:

    • The gambler would leave with less money than they started with around 78% of the time
    • About half the time, the gambler would lose more than $70
    • 10% of the time, the gambler would lose more than $170
    • 10% of the time, the gambler would win more than $40

    Since we have to answer similar questions regardless of methods used (everyone makes risk-based statements regardless of what they call it), I challenged my skeptical colleague to answer the question in his way and his answer was simple.  He would advise the gambler to look at the casino itself, because it logically means they take more money than they give out.   While true, there is a large amount of uncertainty in that statement and lacks any feedback or ability to learn over time.  We can do better.

    A Better Gambling Story

    Pure games of chance (like roulette) have loads of variability and almost zero uncertainty since it’s in the house’s interest to make the games as unbiased as possible.  This makes them ripe for some simple models and allows us to create a better story.   Plus, by telling a gambler that they’ll lose more than they win isn’t very helpful, and may erode (or fail to build) trust, especially when the gambler walks away with money. 

    My solution is to model visits the casino repeatedly and see how the gambler does (a method known as Monte  Carlo).  I set up the model to play Roulette 250 times per visit, betting on the 1 to 1 payout options, and record the offset in cash for the gambler over the visit, then I set the model to run 10,000 times.  Finally, I made a pretty picture (with red, yellow and green of course) that showed the trends over the 250 iterations (left to right).  By looking at this we final-10k2can get a sense of the individual stories here.  For example, there’s a red line that hovers around $50 early in the visit (around rounds 40-70) and then ends up dipping down for a loss around $200 (sound familiar?)  Overall though, this should help inform the simple roulette gambler

    So, can I tell my colleague exactly how much he’ll walk out of the casino with?  Absolutely not, nobody can.  But given the correct assumptions we can make some statements of probability that reduce the gambler’s uncertainty better than other methods (including the de facto unaided intuition).  This is an important point: it’s not that statistics and math is going to convert lead into gold, but it will be better and more consistent than alternatives.  We cannot lose sight of that.  It will always be possible to poke the models (and some really deserve to be poked), but we should not tear down a solution just to replace it with something worse. 

  • 13 Dec 2011 3:29 AM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by Patrick Florer, I am just migrating the post to the new SIRA site. 

    (This is the first of three posts)

    Most of the people in SIRA have heard of the PERT and BetaPERT distributions.  Many of us use them on a daily basis in our modeling and risk analysis.  For this reason, I think it is important that we understand as much as we can about where these distributions came from, what some of their limitations are, and how they match up to actual data.

    The PERT Distribution:

    The PERT approach was developed more than forty years ago to address project scheduling needs that arose during the development of the Polaris missile system.  With regard to its use in scheduling, we can agree that the passage of time has a linear, understandable nature (leaving quantum mechanics out of the discussion, please) that might be appropriate for estimates of task completion times.  The Polaris missile program probably wasn’t the first big project these people had done (DoD and Booz Hamilton), so we can also assume that the originators of PERT had both experience and data to guide them when they constructed the function and created the math that they did. 

    The BetaPERT distribution was developed by David Vose in order to provide more flexibility in allowing for certainty or lack of certainty in PERT estimates.  Vose added a fourth parameter, called gamma, that impacts the sampling around the most likely value and consequently controls the height of the probability density curve.  As gamma increases, the height of the curve increases, and uncertainty decreases.  As gamma decreases, the opposite happens.  Some people use gamma as a proxy for confidence in the most likely estimate.

    For additional information about the PERT and BetaPERT distributions and how to use them, please see the excellent SIRA blog post that Kevin Thompson wrote a few weeks ago.

    (In order to keep things simple, from this point forward, unless there is a reason to make the distinction, I will use PERT to mean both PERT and modified/BetaPERT.)

    What’s the problem?

    Most of us have been taught that PERT distributions are appropriate tools for taking estimates from subject matter experts (SME) and turning them into probability distributions using Monte Carlo simulation.  As many of you know, this is very easy to do.  The graphics and tables look very nice, informative, and even a bit intimidating.

    But how do we really know that the distributions we create have any validity?

    Just because they may have worked in project scheduling, why should we believe that the distribution of loss magnitude, frequency, or anything else actually corresponds to the probability distribution that a PERTfunction can create?  Even if these distributions are useful and informative, might there be circumstances where we would be better served by not using them?  Or by using other distributions instead?

    I will address three of these issues below.

    In case you don’t feel like reading the whole post, I will tell you right now that:

    1. Yes, there are circumstances where PERT distributions do not yield good information.
    2. In a series of tests with a small data set, PERT distributions DID seem to correspond to reality – closely enough, in my opinion, to be useful, informative, and even predictive.
    3. Depending upon what we are trying to model, there are other distributions that might be even more useful than PERT.
    4. It’s a continual learning process – I want to encourage everyone to keep studying, experimenting, and sharing when possible.

    When the PERT distribution doesn’t work

    One of the assumptions of the PERT approach is that the standard deviation should represent approximately 1/6th of the spread between the minimum and maximum estimates.  When I look at the PERT and BetaPERT math, I can see this at work.  (for a full explanation, see  I have also read, and have demonstrated in my own experience, that PERT doesn’t return useful results when the minimum or maximum are very large multiples of the most likely value.

    For example, try this with OpenPERT or any other tool you like:

    Min = 1

    Most Likely = 100

    Max = 100,000,000

    gamma/lambda = 4

    run for 1,000,000 iterations of Monte Carlo simulation, just to be fanatical about sampling the tail values.

    (BTW, this is not a theoretical example – these data were supplied by a SME as part of a very large risk analysis project I was involved with two years ago – some list members who were involved in that project may remember this scenario.)

    I think that you will find that the mode (if your program calculates one), the median, and the mean are all so much greater than the initial most likely estimate as to be useless.  In addition, I think that you will find that the maximum from the MC simulation is quite a bit lower than the initial maximum estimate.

    Here are my results:

    (Please note – if you do this yourself, you won’t get exactly the same results, but, if you run 1,000,000 iterations, your results should be close)


    Min = 10

    Mode = 40,058,620

    (This is very interesting – in a distribution this skewed, you would expect the mode < median < mean: maybe this is an example of why Vose considers the mode to be uninformative?)

    Median = 12,960,195

    Mean = 16,675,377

    Max = 94,255,562



    Min = 6

    Mode =    ModelRISK doesn’t calculate a Mode – see Vose’s book for the explanation of why not

    Median = 12,923,895

    Mean = 16,654,354

    Max = 93,479,781


    What’s the takeaway here? 

    That there are some sets of estimates that PERT distributions don’t handle very well.  When we encounter large ranges that are highly skewed, we may need to re-think our approach, or ask the SMEadditional questions.


    To be continued …

  • 13 Dec 2011 3:27 AM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by Patrick Florer, I am just migrating the post to the new SIRA site. 

    (this is the second of three posts)

    Does the shape of PERT distribution match up to an actual data set?

    At the beginning of this post (Part 1), I raised the question about the validity of the PERT function and whether the distributions it creates correspond to anything.

    What follows are the results of an attempt to answer this question using a small data set extracted from a Ponemon Institute report called “Compliance Cost Associated with the Storage of Unstructured Information”, sponsored by Novell and published in May, 2011.  I selected this report because, starting on page 14, all of the raw data are presented in tabular format.  As an aside, this is the first report I have come across that publishes the raw data - please take note, Verizon, if you are reading this!

    Here is a histogram of the 94 actual observations, created using the standard functionality in Excel (Data\Data Analysis\Histogram) and tweaked a bit to show probability instead of frequency.

    As you can see, the histogram is suggestive of a positively-skewed distribution - with some exceptions – there are several peaks and valleys.  What these peaks and valleys mean is unclear – it could simply be observations that are missing – the study size was small:  N = 94 organizations.  Or they could be real – only more observations would tell us.

    At this point, I asked myself – what if the Ponemon study had captured and had published minimum, maximum, and most likely values instead of single point estimates?  If it had, then we could have constructed a more informative histogram.

    In an attempt to simulate what things might have looked like, I took the Ponemon study raw data, computed minimum and maximum values for each of the 94 data points, and then ran a Monte Carlo simulation, using the following parameters:

    Most Likely = the actual reported cost estimate provided by the report.

    Min = Most Likely  x  a random number between 0 and 1

    Max = Most Likely  x  ((1 + a random number between 0 and 1)  x  Most Likely))

    gamma/lambda was set to 4 for all.

    Since true minimum and maximum values were not reported by the study, I decided that using a random number as a multiplier to calculate both the minimum and the maximum values seemed as defensible as anything else for the purpose of my simulation.

    I then ran 10,000 iterations of Monte Carlo simulation for each of the 94 BetaPERT functions, which resulted in 940,000 total estimates.  Using 940,000 data points, standard functionality in Excel (Data\Data Analysis\Histogram), and a tweak to show probability instead of frequency, I created the following histogram:

    This histogram is even more suggestive of a positively-skewed distribution.

    But the same questions remain:  Are the dips and valleys representative of missing observations, or are they real?  And, how well would a BetaPERT function predict the shape of this histogram?  How well would any other probability function perform, for that matter?  And, perhaps most importantly, what, if anything, can we extrapolate about other compliance cost data sets from this one?

    So, it was time for another experiment or two!

    To be continued …

  • 13 Dec 2011 3:25 AM | Marcin Antkiewicz (Administrator)

    This blog entry was originally written by Patrick Florer, I am just migrating the post to the new SIRA site. 

    (this is the third post of three)

    Experiment #1 – how well does BetaPERT predict the actual data?

    Using the overall minimum and maximum for the 94 observations, I ran another Monte Carlo simulation using these parameters:

    Min = 0

    Most Likely = 2,000,000 (derived as described below)

    Max = 7,500,000

    gamma/lambda = 4

    Monte Carlo iterations = 100,000

    Excel could not calculate a mode because all 94 values in the Ponemon study were unique.  Using a value for the bins = 76 (0 through 7,500,000 binned by 100,000), I obtained a value of 2,000,000 from the histogram, which I used as the mode/most likely estimate.

    Here is the histogram that was created by ModelRISK, using the VoseModPERT function:

    As you can see, the shape of the histogram created by the BetaPERT function is similar to the histogram for the actual data.  But is it similar enough to be believable?  A comparison of values at various percentiles tells a better story:

    With the exception of the minimum estimate, where the variance is due to using 0 as a minimum estimate instead of 378,000, and the estimates at the 5th and 10th percentiles, the remaining variances are within +/-20%.   In fact, all of the variances between the 1st and 99th percentiles are positive, which means that, up to the 99th percentile, the BetaPERT function has over-estimated the values.

    Is this close enough, especially the 8% underestimate at the 99th percentile?  For me, probably so, because we already know that we are going to have trouble in the tails with any kind of estimate.  But for you, maybe not - you be the judge.

    Experiment #2 – is there another distribution that predicts the actual data more closely than BetaPERT?

    The ModelRISK software has a “distribution fit” function that allows you to select a data set for input and then fit various distributions to the data.  Using the 94 compliance cost values from the Ponemon study as input, I let ModelRISK attempt to fit a variety of distributions to the data.

    The best overall fit was a Gamma distribution, using alpha = 3.668591 and beta = 588093.8.  The software calculated these values – I didn’t do it and would not have known where to start.

    Here is the histogram:

    And here is a comparison based upon percentiles:

    This gamma distribution is even a better fit than the BetaPERT distribution.  Except at the extreme tails, the variances fall within +/- 11% - closer than the +/- 20% of the BetaPERT.

    From a theoretical point of view, this is interesting.  But from a practical point of view, it is problematic.  It’s one thing to fit a distribution to a data set and derive the parameters for a probability distribution.  But it’s quite another matter to know, in advance, which distribution might best predict a data set and what the parameters should be for that distribution.  In addition, the Gamma distribution is typically used for creating distributions that describe the random occurrence of events during a time-frame.  I am not sure how appropriate its use might be to describe a distribution of loss magnitudes or costs – I plan to find out!


    Concluding remarks:

    While tests on a single, small dataset do not provide conclusive proof for the ability of the PERT and other distributions to match up to actual data, they do provide encouragement and motivation for further testing.

    It would be useful to perform tests like these on larger datasets.  Perhaps one of you has access to such data?  If so, how about doing some tests and writing a blog post?

    It might also be possible to use the “Total Affected”/Records exposed data from datalossdb to test the ability of BetaPERT to model reality.  I would invite anyone interested to give it a try.

    As we build our experience fitting parametric distributions to different data sets, our knowledge of which distributions to try in which circumstances will surely grow, and lead, hopefully, to more useful and believable risk analyses.

<< First  < Prev   1   2   Next >  Last >> 
Powered by Wild Apricot Membership Software