Book review—In memory data management

SAP’s SansSouciDB promises a ‘true’ relational database for both transactions and analytics.

‘In Memory Data Management: An inflection point for the enterprise1’ (IMDM) by SAP’s Hasso Plattner and Alexander Zeier is a curious read. Its starting point is the dichotomy between transactional databases and reporting/analytical systems. Historically, these have been kept separate for performance reasons. You don’t want your transactional system to be brought to its knees by a query from an analyst. Another problem that arises in database analytics is that the ‘pure’ relational model of linked tables and indexes can provide poor performance with joins across many tables. But systems are speeding up. Do we really need all this complexity of non relational systems duplicated for transactions and analytics? The thesis of IMDM is that modern systems are so powerful that you can realize database nirvana by running the database in memory. This database runs so fast that complex queries can proceed with little or no impact on the transactions. All you need is a fast machine with boatloads of fast RAM. One data base, one data model. Well, that is the theory.

It is not quite as simple as that. IMDM spends a lot of time (more detail than most readers would probably want) explaining how to adapt data structures and squeeze the database into memory and retool queries as stored procedures. The book revolves around a system called SansSouciDB (SSDB), as used in SAP’s ‘Hana’ appliance (Oil ITJ December 2010). SSDB stores active data in ‘compressed columns’ in memory. External (disk) is reserved for logging and recovery purposes and to query historical data. SSDB makes use of parallel processing, across blades and across cores. Test target architecture included high end blades each with 64 cores and 2 terabytes memory per blade. Uses cases are briefly presented—one for smart grid meter data, another for real time sensor net/RFID data streams.

Sometimes IMDM’s scope is bewildering. What is that Microsoft Surface doing there? Why so much on specific Intel Nehalem architectures, on virtualization and even ‘big data’ and cloud computing. The authors appear not to leave any trendy IT stone unturned, ensuring that the marketing message for ‘in memory’ resonates in every ear. Could SSDB include an element of FUD2?

A better title for this book might have been adapting and optimizing databases to novel architectures. But this in a sense would be giving the game away. Modern architectures, as we saw in our review of Pete Pacheco’s introduction to parallel programming (April 2011), far from offering straightforward performance hikes to either seismic imaging or relational databases, actually require a retooling of the system, application and likely the user code to realize the full potential of the hardware. At one level, IMDM can be seen as an interesting discussion on how to build and run a fast database and will be of interest to technologists. On the other hand, it does a good job of debunking the notion that ‘in memory,’ at least with current architectures, is a straightforward route to performance. Sans souci (French for ‘no worries’) it is not!

1 www.oilit.com/links/1106_40.

2 Fear uncertainty and doubt.

This article originally appeared in Oil IT Journal 2011 Issue # 6.

For more information or to comment on this topic email here.