The Language Technology Group
has just produced a report titled ‘Text Analytics APIs (TAAPI), a
consumer guide’ an analysis of commercially-available hosted
programming environments. TAAPI is authored by LTG’s natural language
processing guru Robert Dale. The 275 page guide includes an exhaustive
test of 26 APIs including Amazon Comprehend, Bitext, Google NL, Reuter’s Open Calais and
TextRazor, which vary considerably in terms of breadth of functionality.
Amongst the more fully-featured, standout performers are Aylien,
MeaningCloud, Rosette and Lexalytics’ Semantria.
Text analytics (TA) is concerned with extracting information from documents and in automated document classification. The commercial offerings are evaluated according to ten key TA capabilities. These include the ability to extract entities (people, places...) from text, classification into categories (sport, business...), sentiment analysis (for or against...), summarization to a short text, and tagging documents according to concepts that may not be specifically mentioned in the text.
Most all of the APIs studied are accessed through an HTTP interface supporting cURL with output in JSON. Data can be supplied as a text string a document or an URL. The report summarizes API capabilities and pricing models along with a short ‘impressions’ paragraph that summarizes typical use cases and limitations. Some APIs come with industry packs (but not for oil and gas!) and for some include access to structured databases of company information. Tests on the different APIs show how automatic keywords and phrases can be extracted from text. The success or otherwise depends on how valuable the keywords phrases are to a particular domain. Results vary quite widely. Some tools fail to return the position of a keyword in a text or do not assess keyword relevancy limiting their usefulness. Other capabilities investigated include linguistic analysis (sentence splitting, syntactic analysis…), relationship extraction (with pre-defined such as ‘has acquired’ or ‘open’ relation extraction). Dale warns here that ‘relationship extraction is the bleeding edge of text analytics [..] requiring a degree of sophistication that is beyond the current state of the art’. This function is considered ‘aspirational’ for most vendors. However, ‘if a vendor provides targeted relationship extraction that is relevant for your business, this should be a major factor in your API choice’.
In conclusion, the decision to deploy a text analytics API clearly implies a significant commitment in developer resources. Selecting the best API for a particular use is clearly a major decision. TAAPI provides a wealth of material to guide this choice and avoid the pitfall of a poorly suited toolset. TAAPI costs $895 and is available from Language Technology.
© Oil IT Journal - all rights reserved.