Five 9's because of the 7 Ps
A few years ago, I was sailing with an Air Force Colonel and a Navy Captain who introduced me to the old military adage of the 7 Ps: Prior Proper Planning Prevents Piss Poor Performace. It has since become a mantra that my team uses to help engineer uptime.
I believe one of the most undersold benefits of the cloud, and in particular Amazon Web Services, is its ability to enabled you to do your prior proper planning.
It is pretty common in a modern software development shop to have multiple environments for production, staging, development, etc. Great care is usually taken to make staging and production environments identical, although sometimes the staging and development environments run on less powerful hardware.
However, many times the data sets that these two environments connect too are different. You could have substantial differences in the size of your dataset or the domain of values in your dataset. So testing a large data migration in your staging environment might not be enough to prevent piss poor performance. A data migration task may run brilliantly in your staging environment, but it takes much longer to execute or encounters values that weren't anticipated in production.
Preventing piss poor performance is quite easy to solve in concept. Simply simulate the exactly environment you will be running against. This means an identical copy of your data and the infrastructure that powers it. Not rocket science, right? Then why doesn't it happen every time? It is usually because of a few costs that creep in.
1. Resource Costs
Production I.T. infrastructure is fairly costly. Costly enough that you usually don't have the spare infrastructure just sitting around to practice on. You might have something you can use, but it doesn't have the same power, memory or capacity. So you find yourself practicing on lower power hardware and making an educated guess on how it scales up to production. You might even get pretty good at this if you do it enough. But there is enough margin of error for piss poor performance to show up and bite you. This was the way many of my projects went before moving to the cloud and it made big projects risky and nerve racking.
2. Time Costs
Let's say you are lucky enough to have an exact copy of your production infrastructure sitting around to practice with. That gear still has to be configured. This can be time consuming, especially if your infrastructure isn't automated. The time for configuration becomes a cost of practicing, and you talk yourself out of practicing as often as you should. It's like never practicing your golf swing and expecting to step up to the tee and hit a hole-in-one...it might happen if luck is on your side.
Removing the Costs
These costs are practically eradicated by the cloud. You have the ability to start up and exact duplicate of your production infrastructure, practice and shut it back down. You pay only for the time it was on line. The costs for which will look very small in comparison to the cost of downtime.
To eliminate the time costs, you need to rely on infrastructure automation. In all fairness this isn't unique to the cloud, but many organizations with their own data centers don't have the same level of automation that is available in the cloud. Amazon Web Services has some great infrastructure automation technology and the time invested in it will be paid back. By automating your infrastructure you can eliminate the time costs associate with configuring the infrastructure.
With these two costs eliminated, you have a low cost framework for experimentation and planning. You can practice until you have the confidence that your deployment is predictably repeatable. And that's how Prior Proper Planning Prevents Piss Poor Performance.