Flare Consultants’ Paul Cleverley observes that the emergence of tools such as Google’s Knowledge Graph, Apple’s Siri and Wolfram Alpha are demonstrating that enterprise search is ripe for a shift from the ‘keyword’ paradigm to more sophisticated techniques. In a recent paper, ‘Improving Enterprise Search in the Upstream Oil and Gas Industry, Cleverley discusses some of these statistical and semantic knowledge representations are set to improve an area where oil and gas ‘has so far lagged behind.’ Organizations face a disconnect between the terminology used in search and the inherent ambiguity of information sources. The mismatch leads to critical information being missed.
Cleverley’s ‘narrow search paradox’ occurs when, as the number of words used in a search increases, the proportion of relevant results actually decreases. He believes that internet-style addition of more terms to ‘refine’ queries may be a problem today.
Enter automatic query expansion (QE) leveraging semantic web technologies and public domain glossaries such as ISO 15926, Schlumberger’s oilfield glossary, and word lists from IHS, the USGS and others. Tests performed on the global document collection of ‘one of the largest corporations in the world*’ showed that QE retrieved, on average, an additional 43% of ‘relevant’ results.
In Google-like search, what counts is what’s on the first page. Corporate search may have different expectations, such as returning an exact list of results. In general there is a trade-off between information recall (completeness) and precision (accuracy). Cleverley shows how semantic technology (Protogé), along with publicly available taxonomies, can be used to improve classification and obtain a better balance between recall and precision.
* Shell is mentioned in the text.
© Oil IT Journal - all rights reserved.