2018 IFP Energies Nouvelles DataSciEnergy conference

INRIA on combining predictive modeling with expert rules. CNRS on why AI proofs of concept don’t make it into production. Total on AI’s ‘rough start’ in industry and on how to ‘fill the empty data lakes.’ U Paris on the history of ML, from the Perceptron of 1958 to Google TensorFlow. Total on detecting weak signals in pipeline data to prevent major safety events. U Paris on combining physics with data-driven models. INRIA’s topological data analysis of zeolite gas filters.

Olivier Grisel (Inria) introduced the Scikit-Learn Python module. Predictive modeling extracts structure from historical records using statistics. Results are summarized and turned into algorithms to make predictions about future events. This is often seen as an alternative to rules written by subject matter experts, but the two approaches can be used together. Tools of the trade include Pandas for data preparation and feature extraction, Scikit-Learn and, for big data, Hadoop, Hive, Redshift and Apache Spark.

Balázs Kégl (CNRS) observed that, despite the enthusiasm, ‘few AI proofs of concept make it into production.’ CIO/CEOs often approach data science by a) installing Hadoop and b) hiring data scientists. In fact, success comes by transforming the business process. Start with ‘Why?’ and here, ‘you don’t need a data scientist!’ Decide what to improve and what KPIs will be used. Then, involve the data scientists and finally, build the system. Kégl advocates ‘data value architects’ capable of identifying and labelling historical data. Kégl took a swipe at current data challenges that are often HR/ publicity stunts! To put the industry back on the rails, his team has deployed the Ramp Studio, a sandbox for creating data challenges that emphasizes data preparation and improves on the unsupervised deep learning approach.

In his keynote, Total’s group data officer, Michel Lutz, agreed that data science has had a rough start in the industry. The last five years has seen an explosion of AI enthusiasm and buzzwords. A sense of urgency has led to a proliferation of isolated applications that ‘addressed empty data lakes.’ Echoing Kégl, Lutz spoke of data science proofs of concept with no future and ‘unread data governance charters.’ Data science has real potential, but it cannot be treated in isolation. Don’t hire more data scientists or build more lakes. Rather move end-to-end data projects into production. Total is using data science in every day applications that satisfy real-world needs. In geoscience this means accelerating studies and reducing uncertainties. In engineering, optimizing production. In marketing, a better knowledge of the customer. Total has now marshalled its data resource and is developing a data-driven culture with input from subject matter experts. Other use cases to date include deep learning across a labelled database of nanofossils.

The data science movement has shined a spotlight on regular data management with a renewed focus on reference data, alignment of well names across data resources and more user training on the value of data. Data science needs a dedicated infrastructure, it can’t be done on production systems. All of the above is supported with a new digital organization, with a CDO (Lutz’ boss) responsible for company-wide data spanning E&P, refining, marketing and green energies.

Total has kicked-off an ‘innovation booster’ organization, a business incubator for data-oriented startups. A cross-discipline data analytics competence center houses a ‘data squad’ for iterative deployment. This provides data framing templates and production-ready projects that promote group-level best practices and solutions. ‘Appetite’ for data science was nurtured with an ‘AI for Leaders’ workshop held last year with talks from AI luminaries.

Total deploys open source-based data lakes that are developed in-house with help from the major software vendors and from the startups. Under the hood, Total uses the Carbon API, Python, R and TensorFlow to collate data from production data sources such as OSIsoft, Hadoop, Excel and TemisFlow.

Patrick Gallinari (U Paris) traced the history of machine learning back to the Perceptron of 1958, inspired by models of the human brain. The modern AI era came circa 2010 with a variety of new techniques. You can run these online with Google’s TensorFlow playground. The GAFA and ‘BAT’ (Baidu, Alibaba, TenCent) jumped on the AI bandwagon and now ‘startups are shaping the data world.’ Why? Deep, many-layered neural nets can now detect cats and faces in tens of millions of images, recognize and translate speech and handwriting. Gallinari concluded that the important developments are driven by the big companies although, ‘the theory is still work in progress.’

Total’s Jean Igersheim and Laurent Querella referred to the Concawe Report on EU pipeline leaks, to focus on a major incident on a Total pipeline in Northern France in 2014. A review determined that pipeline monitoring data contained small signals that might have been used to provide advance warning. Total embarked on a program to create a holistic view of pipeline inspection and operating data to reveal patterns and create a data-driven model of degradation risk. Data comes from in-line inspections, pigging, above ground and cathodic protection survey and more. What is hard though, is cross-discipline analysis. This has given rise to an ongoing data science study to detect weak signals across the diverse data set.

Emmanuel de Bézenac (U Paris) tempered the enthusiasm for neural nets, observing that it can be hard to interpret the results from such brute force/black box approaches. Moreover, solutions may not be consistent with physical principles such as conservation of energy. It is however possible to inject physics into the ML models. De Bézenac’s group has demonstrated that this by blending advection/diffusion equations into a data-drive model of sea surface temperatures in the Gulf Stream. The approach leveraged Flownet, a convolutional neural net originally developed for computer vision.

More technical papers included a presentation by Steve Oudot (Inria) on topological descriptors for geometric data. Zeolite molecules are used to filter refinery gasses. Their complex cavity geometries can be used to trap H2S and other impurities in natural gas. Characterizing their porosity is non-trivial, but important. Enter topological data analysis that uses Gromov-Hausdorff distance to normalize topologies. Persistence theory also ran as did the Inria Gudhi project.

More from the IFPen/Inria DataSciEnergy home page.

2018 IFP Energies Nouvelles DataSciEnergy conference

Click here to comment on this article

Click here to view this article in context on a desktop

2018 IFP Energies Nouvelles DataSciEnergy conference

Sign up for occasional emails and subscription information...

Click here to comment on this article

Click here to view this article in context on a desktop