Travis just posted, "Unlike the Weather, You Can Choose Your Cloud", discussing the recent power outages we experienced here in the Northern VA area last week. As a native of this area, thunder storms and hurricanes do pass our way and affect us during this time of year. Late May of 2008, we had a storm very similar to this one and this isn't the first Amazon outage, "Amazon EC2 Outage Downs Reddit, Quora" April 2011, nor will it be their last. Intuit's SaaS QuickBooks was down 36 hours in June 2010, "Update: Intuit Sites Outage Strands Thousands of SMBs". Or just this past Tuesday Salesforce.com was down, triggered by a power outage at an Equinix data center in Silicon Valley. Each time one of these outages occurs, it gains big headlines that the cloud has failed.
What I found interesting during the previous Amazon outage was the number of social media sites in one data center, in my backyard. As a marketer, I use many of these sites in my business life as do many organizations today and we have become reliant upon them. Outages are going to occur, it is incumbent upon those contracting the services to instrument their workloads and applications that they host for management and to also have DR plans suitable for the service hosted. I discuss Six Tips for Cloud Service Contracts and Finding & Categorizing Services - The Golden Key to Quality & Right Sized Cloud & Hybrid Deployments where I raised awareness to service levels with service providers and essential service categorization to best guide those service levels.
The Intuit outage was more than 2 years ago; similar outages and the previous Amazon outage, all generated headlines of angry customers, customers leaving for another provider and questions as to the viability of cloud services. Leaving for another provider will not solve the situation. Outages are going to occur and it is the planning and testing of the appropriate DR strategy that is relevant. Cloud services are here to stay, the question is the reputation of the provider and the contracting organization to best instrument their services for management and plan for the disaster. Let's not forget testing the plan on a regular basis for multiple points of failure.
To add to my previous post of service categorization, this latest outage reminds me of thinking about your competition. As a solution marketer, I watch the competition on a regular basis and also posted how IT Departments Need to Run Like IT Vendors. When moving services to the cloud, you must consider the priority the service has to your organization to drive your redundancy and DR planning. High availability comes at a cost, so be sure to plan for it with your mission critical services. That said 2 things have come to me as a result of these continued outages:
- Redundancy - High availability services should be spread across multiple zones, data centers and geography. Redundancy in the same physical location brings risk. Redundancy and fail over to another center in a different geography, potentially a different provider is also a consideration.
- Competition - If I were to place my services with a cloud provider, I might be interested in what my competitors are doing and if they are in the same location. Knowing this, which the press has exposed quite well, I would make the diligent effort to host my services in a different location and/or at least have redundancy that is disconnected from the primary site of my competition.
Again, these levels of redundancy and DR come at a cost and must be weighed carefully against the value it delivers in high availability in your organization's market for these services.
Instrumentation to manage the services and the service provider are also key to proactively intercept events that might indicate increased risk to initiate a fail over prior to an outage. We once had an insurance provider who not only used Operations Center to monitor performance of their services based upon technology events, but they also incorporated weather maps. The mapping events provided proactive indication of impending events to data centers that may be impacted, but also an early warning indicator of increased traffic to the systems and services requiring high availability during times of disaster. Another investment banking firm monitored transaction volumes and as trading transactions increased, risk increased and initiated load balancing to other data centers to mitigate the risk of an outage during key trading hours.
DR strategy, planning and management of risk cannot wait until the disaster occurs. Technology enables great innovation for our organizations in these times and begs of a new IT organization with new management techniques.
How are you service enabling your services for the future?
Jul 12 2012, 08:46 AM
Filed under: Systems Management, SLA, service level agreements, IT management, Cloud computing, Availability, cloud, Infrastructure, Operations, Social Media, adoption of cloud, BSM, Business Service Management, Service Providers, End-to-End Management, Data Center, Application Performance, Operations Center, IT Operations, SaaS, Software as a Service, Systems Monitoring, Disaster Recovery, Cloud Management, Infrastructure-as-a-Service, Quality of Service, as-a-service, High Availability, Amazon, Michele Hudnall, Data Center Solutions