Oracle and IBM databases: Disk-based vs In-memory - BIG DATA

Breaking

Sunday, 5 May 2019

Oracle and IBM databases: Disk-based vs In-memory

Oracle and IBM databases: Disk-based vs In-memory

The case for in-memory databases (IMDB) can be made in three simple points (1) performance - data is kept in RAM so no disk I/O limitations (2) HA with built in fail-over (3) support for relational schema and SQL. Current disk based RDBMS can run out of steam when processing large data. Can these problems be solved by migrating from a disk based RDBMS to an IMDB? Any limitations? To find out, I tested one of each from the two leading vendors who together hold 70% of the market share - Oracle's 11g and TimesTen 11g, and IBM's DB2 v9.5 and solidDB 6.3.

Feature overview

The key functionality of IMDB is their support for SQL and stored procedures. This allows developers entrenched in the database SQL world to realize the benefits of in-memory data access. Other technological solutions, such as in-memory caches, require the adoption of a whole new programming paradigm.

Another thing to highlight is the use of IMDB as a secondary cache to RDBMS. This allows the database administrator to configure commonly used tables and views from the underlying RDBMS to be held in-memory.

How I tested?

The test environment will not set any performance records, but was sufficient for comparison purposes: Two physical machines with Dual core Pentium 4 CPU 2.6GHz, 1 GB of RAM and Linux CentOS 5.3 x64.

A single instance of each database was installed on each machine (due to scalability limitations - more later) in its default configuration. The same database schema was used, adjusted for small differences in each databases implementation. All tests were run in isolation and averages taken.

The focus was on read operations as the power of IMDB lies in fast data access. ACID operations, such as writes, can involve additional logging overhead leading to reduced performance. The tested dataset was 300,000 records, enough so that all data could be kept in the available RAM.

Results
I am only able to share the trends that I observed, due to license restrictions. Understandable due to the massive differences that can be affected during configuration and tuning. More about this here.

Selects were on average 5x faster for each IMDB compared to its RDBMS brother
Inserts and deletes were found to be (2x and 4x) faster for each IMDB compared to its RDBMS brother
What about scalability?
Horizontal scaling should allow increased data loads to be simply met by adding nodes to the resource pool. This requires a mechanism that transparently routes queries without the application specifying where the data is stored

solidDB only provides transparent queries when architected using two database instances connected in HA HotStandby mode. It doesn't support further scaling or any kind of data partitioning.

TimesTen is more advanced and has Cache Grid technology, which is designed to provide horizontal scalability. A global cache group can be created from a cluster and data partitioned on each node  to enable transparent querying.

Food for thought
IMDB are faster when it comes to selecting, inserting and deleting. However, when RDBMS caching and optimisation algorithms came into play on frequently accessed data, the difference in select speeds are minimal.

To realize the benefits of in-memory databases, they are best deployed as secondary caches or for the processing of datasets that can be accommodated in the available RAM. If selects are ad-hoc and need to run over large datasets then the required data cannot be feasibly kept in RAM without incurring considerable costs.

Scalability limitations of IMDB mean that it can be advisable to use in-memory data-grids, such as Oracle Coherence, GigaSpaces XAP or GemFire. These provide transparent partitioning and data querying out-of-the box. Risks? Yes, all that beautiful SQL logic will have to be refactored into an object oriented language (Java or C).

No comments:

Post a Comment