Under the Cognite Data Fusion hood

Oil IT Journal Interviews Geir Engdahl, CTO and co-founder of Cognite. Cognite Data Fusion is built atop a PostgreSQL database. A new API, a.k.a the PG3, a ‘Pretty Good Property Graph for PostgreSQL’ allows for ad-hoc model extensions. Cognite likes standards, graphs and open source software. Has Oil IT Journal’s long time skepticism regarding the Semantic Web been misplaced?

Oil ITJ – How did Cognite get started?

Engdahl – We set up Cognite six years ago with the idea of doing AI on industrial data. This proved to be hard going because of the diverse data sources. We were fortunate that Aker BP supplied us with a comprehensive data set which enabled us to build a data layer, a foundation for AI/ML. In fact we are now known as ‘the data layer company’. Around the time some companies were using the cloud to build data lakes – but they were not realizing the expected value, partly because joining data across different sources meant losing context. We ended up building a large, all-encompassing data model spanning CAD, 3D models, drawings, SAP work orders and asset/historian data. All went into a single data model. Initially this was simple with just values, assets, time series, events and so on.

What modeling technology are we talking about?

The data model is stored in a PostgreSQL open source relational database. This was hand-coded by Cognite such that customers could not change stuff. This was essential for the stability of the system. More recently we have added a data modeling option, Cognite Data Fusion (CDF), to allow customers to define their own data models.

What technology is used here?

We use using GraphQL with extensions for things like inheritance. This allowed us to materialize some industry standards. The first was CFIHOS – leveraging work done by Aker Solutions on a digital twin for AkerBP. This allowed CFIHOS to be used in operations.

Although it was designed for handover?

Yes but here is it also used in operations. We also added parts of ISO 15926 concepts and properties for process plants.

Which ISO 15926 parts are we talking about – there are a lot of them!

Parts 2 and 4 with some of the RDF/OWL work done inPart14. What is key here is the actual implementation. Standards are not just on paper, they need to be instantiated, populated and useable.

We have been tracking RDF – the Semantic Web - for a couple of decades and came to the conclusion that it was pretty much un workable for non-specialists. Also that the triple is a frustratingly poor tool for modeling even simple stuff like adding a unit of measure to a property.

You are right. There was no success at rolling out the semantic web at scale and indeed there is the challenge of a ‘normal’ developer struggling with the whole domain data model. So we developed the concept of solution data models, simple models tailored to a specific domain and task. There may be hundreds of domain models linked into the Cognite data layer.

Where does this leave the standards?

Standards have failed to create a lot of value but they do hold promise if they are populated. Normal people need to be able to access the data. We will have stripped-down versions of these standards over time as we get a better feel for what is being populated and used.

Is the PostgreSQL database still canonical?

Yes, but we use our own graph API which we call PG3 (for Pretty Good Property Graph for PostgreSQL). We use the latest version of PostgreSQL graph traversal¹ to send a query on to PostgreSQL. In fact we may open-source PG3 later on. Another part of our technology is that CDF replicates the PostGreSQL data to an ElasticSearch database, so queries can be relations, graphs or free text. The duplication to Elastic is transparent for users.

So the data layer is also the graph layer?

Data is all in PostgreSQL but all CDF data is accessed through the PG3 interface. This is important to for data integrity².

You all are pretty keen on open source!

Our approach, using common open source tools, has been successful, not least because we started out working in the Google cloud environment. When we found that the oil and gas domain was dominated by Microsoft/Azure we had to migrate stuff like Google Big Table to Azure – in fact this wasn’t too hard.

What does a user/developer need to have proficiency in to access CDF? GraphQL, C++, Python?

GraphQL if they want to, there is also a Python SDK that uses GQL under the hood. We are adding Java Script and others – all accessing through the API.

This is all quite a surprise to us. We have been tracking the semantic web for a couple of decades and at a recent conference I described RDF/triples as one of the biggest data failures, despite having had enormous support from, especially from the EU. Did you benefit from any of these EU projects?

Some individuals have worked on these for sure. We are also a member of OSDU and are working with the OPC Foundation on CCS.

OSDU is pretty far removed from graph technology!

Yes (laughs) – we are just leveraging the well model.

So was I wrong to diss the semantic web?

No – you are correct. It proved too hard to understand and complex for implementation by ordinary developers. The issue with the triple – not being able to simply add a UOM³ to a node is why we have moved from triples to the property graph which is more explicit that the entity-attribute approach of the tripe store. The property graph⁴ is easy to understand and give you what you expect from a graph node/edge.

We have already reported on Neo4J as a popular graph database.

Yes we used Neo4J In an earlier version for AkerBP. We abandoned it as being hard to scale and not so good for a multi-tenant implementation that shared hardware resources between different users. PostgreSQL allows limits to be set on individual jobs.

1) See for instance this Postgres World article.

2) This is a similar consideration to earlier uses of the relational database where API access was recommended (but not always followed).

3) Unit of measure.

4) An interesting comparison is available here.

See also the Cognite Data Fusion Release, ‘Advancing Digital Twins with Data Modeling’.

Under the Cognite Data Fusion hood

Click here to comment on this article

Click here to view this article in context on a desktop

Under the Cognite Data Fusion hood

Sign up for occasional emails and subscription information...

Click here to comment on this article

Click here to view this article in context on a desktop