Following its acquisition of Nstein earlier this year, enterprise content management specialist Open Text has re-cast Nstein’s technology into the Open Text Semantic Network, a cloud-based solution for indexing and searching large volumes of text. OTSN is a search-driven content delivery platform developed in PHP/Java embedding Open Text’s content analytics engine, Apache’s SOLR and Lucene.
Oil IT Journal signed up for the OTSN Beta trail and had the system index the 4,000 plus articles on oilit.com. You can view the results as a ‘faceted’ search on www.oilit.com/links/1011_11*. We used the search engine to find a reference to ‘JMP’ while researching our book review below. The search returns one direct reference, to SAS’ PAM offering. This is the easy part—we got this using our vanilla ‘FreeFind’ search tool. But OTSN also returns a list of half a dozen other references that are, as it were, one concept away from ‘JMP’. These point to articles on the use of SAS in Total, our review of SAS’ GGRE analytics in oil and gas and two non SAS pieces on predictive maintenance from Emerson, IBM and Matrikon. The latter actually refers to the use of a competitive stats package from StatSoft.
What OTSC is doing is recognizing key concepts in the stuff it finds with free text search and kicking off smart search for related topics. It seems to work. You could argue that there is likely more ‘near relevant’ material to be found on oilit.com. But on the other hand it is nice to have five pertinent suggested documents. The same ‘JMP’ search on Google returns just over 8 million hits. This is pretty impressive stuff for a bare bones implementation of semantic search. There has been no tuning of the tool to this website, no use of the API at all.
As the technology evolves, OpenText plans to add ‘true’ Semantic Web capability to the toolset, exposing detailed annotations from the annotator. This will let developers blend search results from a ‘local’ resource (such as Oil IT Journal) with public semantic datasets such as www.DBpedia.org (an RDF Database of Wikipedia). Likewise, terms recognized as geographic or other structured entities such as ‘Paris’ will be recognized as such and be linkable to databases such as www.geonames.org.
Open Text is also developing more mainstream functionality for OTSN with plugins for document stores such as Microsoft SharePoint or its own Content Server. These will enable ‘find the expert’ type searches across large organizations.
* The trial was set up on the public www.oilit.com website. Site licensees may have to log out of the corporate site to test drive on the public site.
This article originally appeared in Oil IT Journal 2010 Issue # 11.
For more information or to comment on this topic email here.