Apache Spark for ‘big data,’ in-memory seismic processing

Texas A&M professor reports on tests of novel ‘big data’ infrastructure.

At a recent meet of the Society of HPC Professionals, Lei Huang (Prairie View A&M University, Texas) presented the results of his research on the use of Apache Spark for cloud-based seismic data analytics. Spark, the latest member of the Apache open source software group’s ‘big data’ offering is claimed to better MapReduce and provide a unified, scalable, parallel processing engine for big data. Working at Texas A&M’s Cloud Computing lab, Huang has leveraged Spark in a ‘platform as a service’ for seismic data processing and analytics.

Spark was developed to overcome some issues with Hadoop and MapReduce which require tuning to a particular task. Spark’s developers’ goal was to design a big data system that is ‘as powerful and seamless as those used for small data.’ Spark offers a unified engine generalized platform with standard libraries for machine learning and ‘graph parallel’ computation on ‘resilient distributed data sets.’ Spark’s ‘data acyclic graph’ engine is said to support fast, in-memory computing. Spark-based jobs embed ‘sophisticated’ data analytics algorithms for image and seismic processing. Huang was previously a parallel programming consultant for Seismic Micro-Technology (now IHS).

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.