The cost of High Availability (HA) with Oracle
What's the cost of downtime to your business? $100,000 per hour, $1,000,000 or more? The recent volcanic ash that has grounded European flights is estimated to be costing the airlines $200M a day. In the IT world, High Availability (HA) architectures allow for disaster recovery as well as uninterrupted business continuity during system failure.This post focuses on a customer’s backend, comprised of a business application stack supported by a dozen Oracle databases. They wish to equip this infrastructure with HA features and ensure that outages do not cost business. How do we address the challenge of pricing the complete solution, with hardware, software, services and annual support?
The options
Active Data Guard can be used if the locations are far apart, while Oracle RAC promises transparent application failover if they are in close proximity. For Enterprise-class users with database heterogeneity, then GoldenGate software, the 2009 addition to Oracle’s portfolio, is an attractive solution.
And for those with no stringent real-time I/O requirements, backup might be enough, so it’s worth considering Oracle Secure Backup or the external cloud variants that allow us to save on CapEx. With this option, there’s no need for extra hardware if the data is on the cloud. Home-made solutions based on Streams are also not unheard of.
Others will prefer hardware replication with intelligent disk arrays from EMC, Fujitsu, Sun (Oracle) and any other vendor in conjunction with clustering software such as Red Hat Cluster Suite. Those with cost as the number one priority might consider open-source disk array software replication methods such as DRBD.
Two architectures, Three competing solutions
Using our Momentum(tm) methodology (technical and economic analysis) we narrow the field down to two alternative HA offerings. These two solutions are based on introducing a HA management layer through either database clustering or disk array replication.
1) Database clustering "DB v1" Oracle RAC allows databases to be run on the server farm for failover and efficiency purposes. When one server instance fails, the other transparently takes over. For storage redundancy, ASM is used to manage data replication between the storage units. HA is achieved through RAC managing redundancy at all levels. (Note: This design assumes geographical proximity of the redundant nodes due to synchronisation issues).
2) Disk array replication "Disk array" OracleDB is still used, but in this scenario the database is unaware that the underlying architecture provides for business continuity. Instead, intelligent disk arrays transparently perform data replication to the remote location. High Availability is invisible to the upper layers of the software stack (Note: also here, there is a limit on physical distance between sites connected, due to latency and bandwidth characteristics)
3) And the third solution? "DB v2" Based on exactly the same architecture defined for "DB v1" and knowing that some associated licensing restrictions would not affect the customer’s operations, we can recalculate the costs using a different licensing model (in this case restrictions were limiting the number of system users).
In short, the finance director will see three solutions to choose from, while the IT architect will only see two.
More on licenses
Oracle currently has a sophisticated licensing scheme and working out the optimum involves exercising the patience of our Oracle sales team. Beyond the alternative licensing models, the usual headache is to calculate the license migration from legacy architecture paradigms. Most vendors defend their business interest so that if you haven’t purchased support in the past, they will make you purchase it backwards, otherwise migration won’t be possible. Such aggressive loyalty schemes can annoy and put customers off who have little choice and so negotiations follow. This is where consultants are used to find an agreeable compromise between both parties.
Pricing
To forecast total project direct capital expenditure, we break it down into the four major CapEx: (1) software licenses (2) hardware (3) services (4) year 1 support package which is normally paid up front.
HA_Cost_Pricing
The original “DB v1” option was priced at $611K. After the license tuning exercise, the total for “DB v2" option came in at $518K - a saving of 93K.
For this type of project, the major cost considerations are hardware and software, while services and support are marginal. For "DB v1", cost breakdown is: 36% hardware, 40% software, 13% services, 11% support.
So, if the hardware only solution removes additional investment in software, can we see significant savings? No. Surprisingly "Disk array" comes in at $561K. Enterprise-grade storage arrays with replication features are not cheap.
Still open to interpretation
The cost of achieving business continuity using software is slightly less, although still comparable, to the hardware-only solution. So hardware or software?
Frustrated strategist: Hardware is not the only answer
This is just another example of how wrong it is to “kill the problem with hardware”. How often do we see people instinctively decide to purchase extra hardware to overcome scalability challenges? To me, this knee-jerk reaction is based on short-termism thinking and is extremely frustrating. Just look. Here, the distributed middleware provides you with the same results, but slightly cheaper. This cost saving will grow dramatically the more you scale out. Why?
Simple - software vendors have greater freedom for volume discounts than the hardware vendors. Once you’ve purchased one license, the vendor’s cost of granting another for free is zero. It’s worth remembering that the larger your installation, the greater the cost gap between hardware-centered and software-centered solutions becomes. For scalability there is no choice, software-centered solutions win every time.
Conservative Tactician: Step back and figure out the context
At the moment this is a small project and we’re not scaling out just yet. It’s surprising how similar the costs of both the hardware and software approaches are. A 7% cost difference isn’t considerable, especially as before the license restrictions work around, the hardware variant was actually cheaper! The price difference itself is not significant enough to help us decide which approach is best. The factors affecting our decision will not be financial, but contextual:
What’s the envisioned technical context of this environment? Is this a dedicated system, or will it be sharing resources with other applications? It’s worth noting that hardware-based replication will equally protect all the data of the software running on top of it. Here you can fix all your HA needs at once, at the risk of volumes of unnecessary replicated data slowing the network. Instead, RAC will only provide business continuity to the data inside Oracle DB; lesser the risk, lesser the reward.
What’s the Recovery Time Objective and Recovery Point Objective in your Disaster Recovery Plan? At failure, Oracle RAC will “guarantee” to seamlessly and transparently switch over the cluster control with no data loss, while other solutions require a few seconds or minutes to switch over (so not being completely transparent). Can you really afford this delay without harming your business?
How does this fit the generic site policies, such as data center virtualization, disaster recovery plan or power consumption targets? Now the picture can become complicated and various arguments lean towards certain solutions. Imagine your environment is virtualized. In case of an incident, the “Disk array” hardware replication will allow all VMs to be automatically restarted on a secondary site. The downside? Service interruption is around 1 minute, while “DB v1” software solution is seamless. On the other hand with virtualization covering two replicated sites you are approaching licensing hell; you’ll pay up to twice the price for all your programs unless you are able to consolidate. The costing analysis here is quite complex but definitely worth investigating. In the long run you’ll easily save six or seven digits by picking a HA strategy that matches the site policy.
Of course these are just the fundamentals and there are many more factors to consider. Any HA project that’s undertaken must take into account the indirect, hidden, and long-term cost implications.
Summary
With small installations, achieving HA and business continuity of your database-oriented software stack can be implemented with various strategies, including hardware or software solutions at an almost identical cost.
Don’t immediately resort to a hardware-only based solution, at least not before proper consultation. Think strategically and ask yourself whether your architecture will need to scale. Cost implications here will be significant.
Don’t rely solely on software until your HA solution covers the complete context. You chose software for agility. You don’t want to close your budget with the business continuity architecture that covers only a fraction of your critical systems.
Apply methodical analysis during the decision process.
No comments:
Post a Comment