Apparently, this is becoming a source in some amateur circles for probability of RAID failure. I stand by my analysis but it seems that most criticisms are that they don’t understand what my assumptions are so they make their own and then leap to the wrong conclusions. This page is a followup of an initial simulation I did, it is available here.
I underestimated the amount of time to run 37.7 trillion hard drive simulations. It’s done now. I have to say this project has been a lot of fun but I am done with it. In case you are wondering what that number is exactly, it’s the average age of the array before failure (where the loop breaks) multiplied by the number of drives in the array (nested loop), multiplied by the 1,000,000 simulations of each configuration. Sum total of 2-64 for RAID 0 & 1, 3-64 for RAID 5 and 4-64 for RAID 6. I have to say that this simulation was rather eye opening and has improved my understanding for RAID as a tool for data redundancy.
For what it’s worth, for all but the most super critical of enterprises the following conclusions can be made from this data:
- If you absolutely, positiviely can never, ever, ever lose your data and have it go offline for a second, you need a 4 disk RAID 1. It never failed in over 1 million simulations. One thing I didn’t model is the failure rate of something like a RAID 0+1 which could create a high failure rate. If your data is your life, RAID 1 x4 is the way to go.
- Limit your RAID 5 array to 10 drives. 50% of the 11+ drive Raid 5 array’s don’t survive 10 years. Throw in a 3 year rate of about 20% and you are at about the odds of playing Russian Roulette with a six shooter with your data.
- If your card can handle it, RAID 6 is the option to have for any large array. Most systems need to keep their array sizes to 24 drives or less, and for this point RAID 6 shines. It outperforms equally sized RAID 5 for reliability by orders of magnitude.
- If you are building a NORCO 4220 based raid solution with 20 drives. Don’t bother with multiple arrays and multiple hot spare configurations. Hot Spare only kicks in when a drive is failed. If you are able to get to your server within 12 hours of notice a 19 drive RAID 6 array (17 drives of data, 2 redundancy) and 1 hot spare will keep you running until you can buy a single drive that will contain all the data in your array.
And now for the art show.

The average age of drives over 1m simulations. This graph shows the lack of stability in RAID 5 array's larger than 10 drives. Assuming most will never have an array with more than 32 drives, RAID 6 is an amazingly robust solution.

The max age graph shows a best case theoretical scenario after 1m simulations. Yes it is possible that a RAID 0 array can last 15 years, but it's more important to examine the tail of this graph where even after 1m simulations RAID 5 can't last a full 25 years.

This graph shows 2 quick things. First, after 4 drives RAID 1 is basically bullet proof. Second, everything else can fail at any time. MAKE BACKUPS!

This graph shows drives failing before the first year. Notice that RAID 5 has a fairly steep rate, yet RAID 6 stays well below 1/2 of the slope.

This graph shows arrays that didn't survive 3 years. Note that even at a minimum of 2 drives, RAID 0 is still over 50%.

This is the minimum point that a home user would expect their array to last. 5 years is a reasonable length of time to obsolete and replace drives. Also this is the point at which single hard drive failure rates start to skyrocket.

Ten years is the minimum reliability I would consider for any sort of Small business operations. Of course backups are always important, but 10 years is a good benchmark. Note that shap difference in the RAID 6 curve.

This graph indicates the drives that survive over 25 years. This is the ultimate indication of how likely you are to be able to 'set it and forget it.'

This isn't that useful a metric because the RAID 0 and 5 are thrown off by high failure rates. It does however show you that the program models a drives increased age leads to increased failure rate because it's not just a direct mapping to (n*r).