FloCon 2023 network security conference

Aramco’s ‘Guppy’ security data lake. Chevron in-houses cyber intelligence provision. Databricks open-sources cyber detection pipeline.

Speaking at the 2023 FloCon conference in Santa Fe, NM, Faisal Alghamdi and Hafiz Farooq presented Saudi Aramco’s ‘Guppy’, a.k.a. a ‘scalable security data lake’ designed to handle multiple security-related data types. A typical large security operations center may monitor over 10,000 IT assets, ingesting terabytes of security data every day. This is an emerging challenge for all large-scale enterprises. Traditional security information and event management (SIEM) is not up to the task. A data lake is the way forward and data engineers are replacing SIEM engineers. Aramco’s security data lake, code named Guppy, takes syslog and other asset security data into a Kafka stream processor and an ELK* stack/data lake. The solution leverages a range of open source and commercial tools including RsysLog, Splunk, Confluent and others. The Splunk GUI is used to query ElasticSearch data for analysis. Splunk also provides a machine learning capability for analysis of data lake events. Guppy provides access to some 300 open source machine learning algorithms for clustering, prediction and outlier detection. Guppy deploys the Elastic Common Schema that defines a standard naming convention for data ingested into Elasticsearch, allowing data from diverse vendors and technologies to be correlated. Event Query Language (EQL) is used for threat hunting and real-time detection.

* Elasticsearch, Logstash, and Kibana.

Teresa Chila described Chevron’s attempts to automate the Diamond Model* of intrusion analysis. The Diamond Model is a methodology for analyzing cyber intrusion events. To date its application is a manual process that does not scale to enterprise-level security. Chevron is working to extend the model, leveraging data science and automation with the aim of using its internal data and traffic to ‘make Chevron its own #1 intelligence provider’. The solution uses Python notebooks to ingest and merge security log data and identify phishing and malware attacks. Identified events are saved to a graph database.

* See for instance Cyware’s explanation and this (possibly canonical) alternative from Active Reponse.

In a similar vein, Markus De Shon presented Databricks’ work on data-driven detection with PySpark, a Python API for Spark. The cyber security framework was built for Databricks’ own operational needs and is now released to the public. Databricks detection engineering team has been using the PySpark platform to build streaming pipelines that can cover basic rule-based use cases as well as ‘full ML models’ registered with MLFLOW. The system currently processes multi-terabyte/day data streams in over 30 pipelines. De Schon opined ‘we believe others will benefit from our example, as well as the code that we're releasing before or in concert with this talk’. Databricks oil industry clients include BP, Shell, ExxonMobil and others. Check out the full list on the oil and gas landing page.

FloCon is an annual network security conference hosted by the Software Engineering Institute (SEI) of Carnegie Mellon University. Presentations and posters from FloCon 2023 are available here.

Click here to comment on this article

Click here to view this article in context on a desktop

© Oil IT Journal - all rights reserved.