On the better late than never principle (and since it was a poster child for Hadoop at Digital Energy) we report on Jia Baodong’s 2010 Masters Thesis from the University of Stavanger (UIS) on ‘Data acquisition in Hadoop.’ As explained in the abstract, oil and gas data is ‘big’ and contains much useful information. Accessing such may be impractical or time consuming. Hadoop/MapReduce is a potential solution to the data mining question—but first, data has to be imported to the Hadoop file system.
The UIC Hadoop cluster ingests ‘historical’ Witsml drilling data supplied by Statoil’s service providers. Once in the cluster, ‘reasoning algorithms’ (in the ‘Pig’ script) are applied to identify interesting information in the data. To test real time data loading the project used a high volume Twitter feed. The thrust of the thesis is that loading data to Hadoop with Chuckwa, an open source ‘data collection engine’ is better than without. Along the road, the thesis gives a glimpse of other components of UIC’s big data solution. Notably ‘DataStorm,’ an ontology-driven framework for ‘intelligent data analysis.’ DataStorm was derived from Stanford University’s BioStorm.
© Oil IT Journal - all rights reserved.