Apache Spark for ‘big data,’ in-memory seismic processing

Texas A&M professor reports on tests of novel ‘big data’ infrastructure.

At a recent meet of the Society of HPC Professionals, Lei Huang (Prairie View A&M University, Texas) presented the results of his research on the use of Apache Spark for cloud-based seismic data analytics. Spark, the latest member of the Apache open source software group’s ‘big data’ offering is claimed to better MapReduce and provide a unified, scalable, parallel processing engine for big data. Working at Texas A&M’s Cloud Computing lab, Huang has leveraged Spark in a ‘platform as a service’ for seismic data processing and analytics.

Spark was developed to overcome some issues with Hadoop and MapReduce which require tuning to a particular task. Spark’s developers’ goal was to design a big data system that is ‘as powerful and seamless as those used for small data.’ Spark offers a unified engine generalized platform with standard libraries for machine learning and ‘graph parallel’ computation on ‘resilient distributed data sets.’ Spark’s ‘data acyclic graph’ engine is said to support fast, in-memory computing. Spark-based jobs embed ‘sophisticated’ data analytics algorithms for image and seismic processing. Huang was previously a parallel programming consultant for Seismic Micro-Technology (now IHS).

Apache Spark for ‘big data,’ in-memory seismic processing

Texas A&M professor reports on tests of novel ‘big data’ infrastructure.

Click here to comment on this article

Click here to view this article in context on a desktop

Apache Spark for ‘big data,’ in-memory seismic processing

Texas A&M professor reports on tests of novel ‘big data’ infrastructure.

Sign up for occasional emails and subscription information...

Click here to comment on this article

Click here to view this article in context on a desktop