The E&P Data Warehouse

Schlumberger-GeoQuest’s Mark Robinson argues in this contributed article that many E&P work processes can be analyzed with the terminology of the Data Warehouse. He further claims that GeoQuest’s Finder is the only commercial data warehousing solution available for E&P.

Virtually all exploration and production activities are centered on the acquisition, processing, and interpretation of digital data. Major and large independent oil companies commonly manage efficient on-line databases of terabytes of information. E&P data management has become a well-defined activity with mature solutions. But there is room for improvement. Today it is no longer acceptable to just manage ‘data’, companies need to manage information and knowledge as well. Lets look at how some concepts and terminology from commercial data processing can help us here.


Project databases support data acquisition activities and contain relatively volatile information. They can be classified as On-Line Transaction Processing (OLTP) or transaction based databases. These types of systems are not considered appropriate for use as data warehouses which must be capable of supporting On-Line Analytical Processing (OLAP) applications and more sophisticated data mining applications. For several years industries outside of oil and gas exploration and production have successfully deployed data warehouses to facilitate their access to information.

E&P Data Warehouse

Outside the upstream Oil and Gas industry data warehousing is a $7 billion business with a reported 35% growth rate. The term “data warehouse” was first coined by William Inmon in 1990 and is defined as a subject-oriented, integrated, time-variant, non-volatile collection of data designed for support of business decisions. The GeoQuest Finder data management system meets this definition. Lets examine each of the above criteria.

(1) Subject-Oriented

Derived from industry-standard data models developed by POSC and PPDM, the Finder data model is subject-oriented. Basic data subjects handled by Finder include wells, seismic, leases, and production data. Operational or project databases may contain these data types, but differ fundamentally in that they must support OLTP applications. In order to support applications such as stratigraphic and seismic interpretation project databases are designed from a process perspective. The design of Finder as a pure data management solution allows it to have a subject orientation free from process or function design constraints.

(2) Integration

Finder alone in the industry clearly and prominently advocates the integration of data and information. Inman (1995) states that the most important aspect of a data warehouse is that all of the data is integrated. A data warehouse must utilize standardized naming conventions, units of measure, codes, and physical attributes if it is to be capable of supporting OLAP and data mining applications. This is in contrast to the world of project databases where each is designed to support a different application. Project databases also allow for the propagation of local or individual naming conventions and encoding standards.

(3) Time-Variant

Time is an important concept in the creation and maintenance of a data warehouse. The data warehouse will contain data covering a long time period while operational databases only contain data required for current activities. Records in the data warehouse should time-stamp data with the last update or creation date (frequently this date is part of the key structures). Finally, data warehouses need to store data as recorded snapshots of past situations.

(4) Nonvolatile

The data in a project database is highly volatile with changes and additions being applied constantly. This must be an important design consideration for a project database that must support real-time application systems. Loading data to Finder is generally a more considered function, which involves business rules, designed to control the accuracy and integrity of the data being managed. As with all data warehouses, project databases are used as the primary source for data. However, it would be wrong to say that their is any redundancy between the data warehouse and project database environments. Redundancy is minimized by the processes used to promote data into the warehouse environment. Data from project databases is frequently transformed to standardize keys and reference values. A selection process is followed that qualifies what data and which records are appropriate for storage in the data warehouse. As discussed earlier, the data warehouse will contain a comprehensive historical version of data and information derived from a wide variety of sources.


OLAP provides multidimensional views of data, calculation-intensive capabilities, and time intelligence. OLAP and data warehousing are complimentary. Data warehouses store and manage the data where OLAP transforms the data into strategic information. OLAP ranges from simple data browsing, through basic calculations, to complex modeling and analysis. In the E&P domain no applications have been positioned as OLAP in the usual sense. However almost every E&P data analysis tool is designed to process data based upon multiple dimensions such as spatial location, depth, and time. It could be argued that virtually every E&P application is OLAP. When a reservoir engineer uses a program to access the data warehouse to display the production history of a well and then generates a production forecast from the calculated decline curve, he is using OLAP. A geologist who creates a net pay map from data retrieved from the data warehouse is also using OLAP. These examples fall outside the commercial definition of OLAP but they involve multidimensional views of the data, are calculation intensive, and have a time component.

Data Mining

Data mining is the process of using advanced algorithms to discover meaningful relationships, patterns, and trends from data and information. Shilakes and Tylman (1998) describe data mining as an art not a science and that no single application or tool provides every function required. Six primary phases have been assigned to the process that are followed during data mining (Fayyad and Simoudis, 1998). The stages are selection, preprocessing, transformation, data mining, interpretation and evaluation. Selection is the basic first step where data and information is selected or segmented according to some criteria such as “all wells that have produced gas from the Cotton Valley Formation in East Texas during the last ten years.”


Preprocessing involves the cleansing of the data to ensure consistent and relevant content and format. When data and information is derived from a qualified data warehouse this step has largely been addressed. Normalizing a set of resistivity curves from the selected wells so that they are consistently scaled would be considered a preprocessing phase.


Transformation is about making the data useable and navigable, and includes processes that transform the raw and preprocessed data into meaningful overlays suitable for analysis. Gridding and contouring stratigraphic tops is one example of a transformation. Data mining is the phase where patterns are extracted from the transformed data and information. A meaningful pattern is one that is based upon a given set of facts, describes a relationship for a subset of the facts with a high degree of certainty. A map showing the relationship between porosity, known production and geologic structure can be a form of data mining. Interpretation and evaluation takes the patterns identified during the data mining phase and converts them into knowledge that can be used to support business decisions. Patterns capable of supporting the successful drilling and completion of wells are the most valuable knowledge in the E&P business. The above phases were illustrated with very simple examples. The E&P industry today makes very little use of more sophisticated pattern recognition technology such as cluster analysis, learning classification rules, and dependency networks.


As data management becomes a more mature process for the upstream oil and gas industry it should leverage the technologies developed for managing commercial and financial data and information. One of these technologies is data warehousing. Data warehousing is a structured approach to managing large volumes of data derived from various sources including project databases, commercial data vendors, and legacy data. Data warehouses integrate and structure the data in a way that greatly enables OLAP and Data Mining operations. OLAP in the E&P industry exists with most applications used to visualize and process oil and gas related data. Data mining is a data analysis process composed of multiple phases that identifies patterns in selected data and converts then into valuable knowledge useful in making business decisions. This analysis was designed to introduce E&P data managers to the terminology and methodology common to data warehousing.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.