It has been a while since I saw real enthusiasm at a user conference. Most upstream software is pretty mature these days and what is not mature is usually massively oversold before anyone gets to grips with what (possibly imagined) problem it sets out to solve and how (if at all) it is going to solve it. But, at the 2016 EU Graph Connect conference in London that I attended earlier this year, there was real enthusiasm both for the novel technology and, especially, for what is being used for.
Graph Connect is a shop window for Neo4J’s graph database technology. If you want to know what a graph database is I invite you to consult Wikipedia and if you think it sounds like something you heard of a long time ago, you could also visit the Talk page. For the party line though, you could try the O’Reilly book. If you are familiar with the semantic web and the RDF triple store, well there is a lot of overlap with the graph database. However, if you spend too much time trying to figure out exactly what something really is in IT, by the time you have got it, the technology will have moved on, so let’s just get on with the conference.
Neo4J founder and CEO Emil Eifrem puts the graph database on a mathematical pedestal that goes back to Euler’s 1736 paper on the Seven Bridges of Königsberg. This proved that walking across all the bridges without retracing one’s steps was impossible. OK that is not perhaps the greatest marketing pitch but Euler’s work did spark off the study of ‘graphs,’ i.e. the relationships between things. These boil down to three core abstractions, node, relationship and property. Eifrem positons the graph model as better suited to modeling many facets of the real world than the relational database. Neo4J’s graph database is used by companies including Cisco, Wal-Mart and LinkedIn*.
Andreas Weber presented on the use of Neo4J in product data management at German toy manufacturer data Schleich (famous for its blue-skinned ‘Smurfs’). The company operates worldwide and has to deal with many legal and regulatory environments covering its products and chemicals. Currently the information is scattered across different relational databases, SAP/ERP and many documents and spec sheets often with local context. Neo4J allows investigation across product, model, bill of materials, substance and DIN specs of components. Engineers’ queries now involve finding the right path through the graph. Weber observed that the bottom-up graph approach is better than doing this with a metadata layer.
The graph talk got a bit closer to the oil and gas vertical with a presentation from James Weaver of Pivotal, whose Cloud Foundry underpins GE’s Predix. Weaver presented his work on Pivotal’s Concept Map a free online tool for ‘navigating all the knowledge.’ I played with the tool, pinning ‘hydrocarbon’ and ‘methane’ into the GUI and clicked on ‘relationship’ to reveal that these items are linked through the ‘Armenian Soviet Encyclopedia’ which is pretty weird. I’m not sure if the Concept Map is revolutionary or anecdotal but it is powered by Spring, runs on GrapheneDB, a hosted edition of Neo4J, and blends structured relationships in Wikidata with article text in Wikipedia.
Dan Murphy showed how the Financial Times has used Neo4J, along with Google’s Go programming language to build a ‘semantic linked data platform.’ The project involved a shift away from a monolithic website, heavy on XML that was deemed unfit for the FT’s nimble future. Following unsuccessful trials with HAL and an RDF triple store Murphy decide to Neo4J and Go (with the Neoism Go library). The system now works as advertised and links allows FT journos to track companies and individuals through to subsidiaries and other articles. For performance, the code needs a lot of tweaking, ‘just like SQL.’
IBM was also in on the graph act. Ivan Portilla presented a curious combo of IBM Watson and Neo4J. The result, WayBlazer is a toolkit for developers of travel websites, notably the very naff ‘Connie,’ Hilton’s robot concierge.
Axel Morgner from Structr observed that in many organizations, Excel is used as shadow IT solution. Various approaches have been tried to break the Excel stranglehold, MDM, ESB and middleware. But it is better to create a unified central system offering tight integration of all data sources and a unified data model. For systems you cannot replace, like SAP, their scope needs to be constrained, ‘SAP is not the best place for all enterprise data.’ SAP at Schleich is ‘very slim in scope.’ Structr’s graph application development platform consumes RDF, ER, XML and Sparql to create a unified central data model.
And then there was Mar Cabra, an ebullient journalist from the International consortium of investigative journalists, with the IT background to the unravelling of the Panama papers. Neo4J, along with Linkurious and Talend were used to analyze the 2.6 terabyes of data that a Fonseca mole had kindly supplied. Over 500 western banks were found to be acting as intermediaries to Fonseca. Cabra pointedly observed that ‘there are lots of banks here.’ The bankers seemed unfazed. In fact the audience was in raptures with this potent blend of politics and geekiness. The rapture turned to delirium when Cabra announced that all of the Fonseca data would be made available to the public. Go see if you are on the list!
* To (perhaps) state the obvious, ‘used by’ does not mean that the technology is in exclusive or even widespread use at any of these companies.
@neilmcn© Oil IT Journal - all rights reserved.