NoSQL – the new wave against RDBMS
Over the past month, much press has appeared in the blogosphere dedicated to the NoSQL movement. I first came across their existence by reading this article on the Computerworld web portal and have been following the heavy traffic on the subject since.NoSQL held their inaugural get-together in San Francisco last month to discuss a future where traditional RDBMS's from the likes of Oracle, Microsoft and IBM are consigned to history in favor of open source data stores. Their ethos is that traditional RDBMS's are not scalable and force data to be twisted to fit into the relational world. What is the likelihood of a world where legacy systems are driven by the new breed of data stores?
NoSQL began the in-house development of data stores, emulating those built by Google and Amazon. These are now handling hundreds of terabytes or even petabytes of data for thriving Web 2.0 companies. Some of the notable open-source projects include Hadoop, Voldemort, Cassandra, CouchDB and Dynomite among others. Proprietary cloud based data stores include Google App Engine 's Datastore, Amazon SimpleDBForce, Windows Azure Storage Services and Force.com Database Services.
The main principles behind these projects are well summarized by Martin Kleppmann and Tony Bain. In essence, data stores are distributed key/value stores that provide unlimited scalability to store data that is closely modeled to objects removing the need for ORM plumbing code. The downsides are the loss of data integrity, which has to be managed by application code and the difficulties in performing business intelligence on the data. A work in progress comparison of alternative data stores can be found here.
The key thing that strikes me about data stores is that they cannot be viewed as databases. Nati Shalom confirms this with his excellent explanation using Amazons SimpleDB as the example. In his post he highlights the limitations of data stores with their lack of referential integrity, transaction support and data consistency (ACID). Additionally, he highlights in what areas their use can bring benefits – notably Web 2.0 applications where the requirement is mostly for read only data that has loosely defined schema's.
Many companies are seeing the performance of their existing RDBMS's drop-off as the requirement to process ever larger sets of data increases. Data stores offer an attractive prospect with their unlimited scalability and simplicity of data management. However, experience shows that their use is only realistic when developing applications from scratch. The idea of simply removing an existing RDBMS and plugging in a data store is not an option for legacy systems where the ACID properties of the systems would have to be moved from the RDBMS's to the application layer.
Additional points of consideration include the lack of vendor support for data stores and their relative immaturity. I will watch their evolution with interest, but it is clear that data stores are currently a no-go area for companies who want to solve their existing RDBMS's scalability and performance limitations.
An alternative approach is to leverage RDBMS's with one of the available in-memory data grids, such as the open source Memcached and JBoss Cache alongside the proprietary GemFire, Oracle Coherence and GigaSpaces XAP. In-memory data grids provide scalability and performance increases with no changes to the underlying RDBMS's and minimal changes required to the application layer. Companies benefit from the in-memory data grid fulfilling the performance and scalability requirements leaving the RDBMS to do what it does best i.e. maintain the ACID properties.
To summarize:
Data stores are commonly used by Web 2.0 start-ups who develop the application layer coupled with the data layer
Data stores do not offer the ACID functionality and vendor support that RDBMS's provide
To address the inherent RDBMS's performance and scalability limitations, in-memory data grids can be used effectively
No comments:
Post a Comment