Spotify and the bottom hole assembly

Baker Hughes leverages Word2Vec neural net to build ‘smart’ recommendation engine for drillers.

What has Spotify got to do with a bottom hole assembly? A lot, according to Scott Sims* (Baker Hughes) who has been trialing Spotify’s Word2Vec technology, building a recommendation engine for bottom hole assemblies. Speaking at the 2021 Nvidia Global Technology Conference, Sims presented a means of clustering and identifying similar BHAs ‘regardless of operator or basin’. Drilling bottom hole assemblies comprise everything between the end of the drill pipe and the rock face. They are made up of various elements, motors, sensors and other components along with crossovers that tie bits of kit together and to the drill bit itself. Optimizing horizontal ‘factory’ drilling requires a knowledge of how different BHAs perform in different circumstances. Different diameters can cause the BHA to sag, compromising the logging while drilling response. A smart recommendation from the model might advise adding stabilizers.

Drilling parameters including rate of penetration, weight on bit and brake speed are easily recorded and available to train a neural net model. Capturing the makeup of the BHA is more problematic. BHA descriptions are recorded, but in rather summary form in a small text file that tabulates components from different vendors. The massive number of different possible component combinations makes it impossible to derive a simple rules-based approach. Data is recorded in a more or less free-form text description with ‘words’ such as “6 x-o 4 to 4 ” .. “9 x-treme motor” and so on. There is no canonical format for (e.g.) motors, crossovers etc. such that regular pattern matching fails.

The problem is analogous to automatic translation. It turns out that a neural net-based approach such as Word2Vec works well, even in the presence of oddball abbreviations and misspelt words. Word2Vec puts similar words close together and they can be tied to BHA components item – motor, stabilizer, x-over etc. Similar words should (and do) get close embeddings in the word vector space** which shows hot spots for ‘bit’ and ‘motor’. More sense can be obtained from the data by adding ‘context’ i.e. incorporating surrounding words. This led Sims to use Doc2Vec and treating the whole BHA string of words as a ‘paragraph’.

Baker Hughes has trained its neural net on some 10,000 BHAs from different North American basins. Sims graciously acknowledged the performance of Nvidia’s high-end DGX-1 GPU computer and containerized machine learning software stack. ‘We are definitely getting our money’s worth from this box’. The system now provides feedback into the field, to inform drillers of any issues with similar BHAs that were run in the past.

Word2Vec and similar tools are used in machine translation and by companies like Spotify, Airbnb and Google where their ability to perform ‘fuzzy’ matches compensates poor quality or ambiguous input data. Sims concluded, ‘The oil industry is used to working with [hard] sensor data. Working with text is something new for us. The study has produced very encouraging early results and we are building a great foundation for real-time work and field deployment.’

Comment: Simm’s comment on sensor data vs text is interesting in that it contrasts with the sales pitch that came from advocates of AI/ML. The early hope was that ML would find hidden truths in large volumes of good data. The BHA study, like other applications of ML such as well event detection and screening scanned well logs, is more concerned with making something useful from rather poor data. For another example of an NLP-style approach, see Rice University’s SEDSI in our report from the 2020 HPC in Oil & Gas conference in this issue.

* With co-author Mohammad Khan.

** You can play with a word vector space and read the Word2Vec tutorial on

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.