Hadoop/MapReduce LIDAR data and the cloud

Cyberinfrastructure presentation reports successful use of ‘big data’ solution for geoscience data.

Speaking at the Cyberinfrastructure Summer Institute for Geoscientists last month, Sriram Krishnan of the San Diego Supercomputer Center investigated the use of Hadoop from the open source Apache foundation to manage Lidar data. Hadoop is a ‘big data’ solution derived from Google’s ‘MapReduce.’ The system is leveraged in cloud computing environments and is used by Yahoo!, EBay and Facebook.

Lidar data, often used in pipeline and seismic line routing, is usually obtained from airborne laser surveys. SDCC uses IBM’s DB2 spatial extender to index Lidar data. The test data set covers a wide area of the San Andreas fault area and is stored on an extensive computer cluster. Krishnan outlined some of the advantages of the current database approach. This offers SQL-based query in a production quality environment but suffers from data loading and retrieval overhead, scalability and cost of the high end hardware and software.

A hybrid solution with point cloud data stored as files in the industry standard ASPRS LAS data format with metadata in the relational database offered better price performance and proved more amenable to the cloud computing paradigm. Hadoop appears a promising programming environment for large scale data processing on commodity resources. These can be public or private clouds or a ‘traditional’ HPC cluster. In a previous publication (www.oilit.com/links/1109_12), Krishnan reported a significant code size reduction with 700 lines of Hadoop Java code equivalent to 2900 lines of C++. Read Krishnan’s presentation on www.oilit.com/links/1109_11.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.