Automating Identification of Rock Layer Tops

The Oil and Gas (O&G) industry is actively accelerating their digital transformation as commodity prices remain volatile and Exploration and Production (E&P) Operational Expenditure (OPEX) exponentially increases. Therefore, it is critical to exploit the bypassed hydrocarbons in brownfields, those reservoirs of oil and gas overlooked during primary production (Lutgert et al., 20131). Invariably, these assets provide geoscientists with rich upstream datasets, such as well logs. (Figure 01).

Some of the key drivers of digital transformation in the industry are:

1. Sustained low oil price and global economic uncertainty

2. Massive strategic and organizational adjustments and industry consolidation

3. Significant geoscientist reduction that necessitates digital transformation

Figure 01: Well logs used for automated Machine Learning workflows to identify tops The industry is traditionally obstinate to move too far from empirical and entrenched deterministic interpretation. Timing the unpredictable commodity market and being first-to-oil is pushing geoscientists to adopt real-time analysis of upstream data with Machine Learning (ML) supervised and unsupervised techniques.

The industry drills boreholes or wells to explore and exploit natural resources such as hydrocarbons. The IoT sensors downhole acquire real-time data while drilling to gather critical knowledge of the geologic formations. The Measurement While Drilling (MWD), and Logging While Drilling (LWD) sensor data are used to address multiple upstream business value propositions:

1. Real-time drilling optimization to avoid stuck pipe and maximize rate of penetration

2. Automated drilling geo-steering to maximize reservoir contact where the O&G is located

3. Risk analysis for incremental well operations in the same reservoir

4. Development of geologic models with robust reservoir characteristics

5. Interpretation of the tops of the different rock layers across a reservoir

SAS is partnering with nimble companies to help the E&P operators to re-imagine their business strategies and tactics in today’s digital age. This digital transformation marries the SME interpretation and the data-driven ML techniques. SAS and Energective decided to select business opportunity (5) to automate the identification of the tops of rock formations. It is critical to remove the burden of long-decision cycles of manual interpretation and generate highly accurate identification of the formation tops.

Data-driven advanced analytical workflows in O&G provide insights and business knowledge by automating and condensing a laborious interpretation process from months to days with high quantified accuracy. We established a repeatable and scalable set of ETL (Extract, Transform, Load) and Exploratory Data Analysis (EDA) steps to generate essential business features in the context of engineering and geoscience first principles.

Case Study: Automated Tops Identification
The offshore Louisiana region has several mature basins rich in historical geophysical and petrophysical datasets. A large, low-cost, independent operator focused on squeezing value from mature assets in this area.

An operator drills a well and measures a set of rock characteristics at a regular step (every few inches or centimeters) to quantify specific physical and geochemical properties at each measured depth. Some measurements are transformed and combined to form new variables via a feature engineering workflow. As a result, each well has a suite of logs with several measured and derived variables.

Figure 02: Sample of well logs from location 177140003500 used as input to ML

We plotted some of the input well logs (Figure 02), displayed as recordings in depth. Having assessed the portfolio of available logs across the study area, we noted several logs missing or poorly recorded with multiple missing measurands. Also, the Thomas-Stieber method is used to distinguish shale types in sandstone and thus had little impact on our targeted formation tops. We decided to confine our training logs to water saturation (Sw), porosity (POR) and volume of shale in the formation (Vsh) as these had consistently well recorded observations.

The analytical workflow followed a SEMMA process that initiates a Sample of the LWD and MWD datasets. The Exploration step provides an array of Tukey2 diagrams to understand the complex relationships and trends. We can make critical observations by studying univariate, bivariate, and multivariate charts.

Figure 03: Correlation Matrix showing a strong relationship between Porosity and Volume Shale An insightful bivariate Tukey diagram is the correlation matrix (Figure 03). We can identify the strength of relationships and determine if there are any positive and negative correlations. It also provides knowledge of any covariance in the input dimension, yielding an opportunity to reduce the number of features for our ML technique without reducing the power of prediction.

The Modification step enables the data scientists to review the distributions of the features. In addition, it provides a suite of statistical workflows to normalize and scale the critical parameters. Finally, a Principal Component Analysis (PCA) can reduce the input space that retains a high variability in the input dimension despite removing several features.


We split the data into training and testing sets before normalization to avoid a biased evaluation.

• Split your data to obtain the test set, and don’t use it for normalization or training
• Normalize using the training data, and save the obtained parameters

• Normalize your input using the parameters obtained during training
• De-normalize predictions using those same parameters to return the user output in the original scale

• Predict the test set getting real-scale predictions (previous step)
• Compare those predictions with the reference examples to obtain the evaluation metrics

The Modeling step trained several supervised models to predict the tops in each cluster based on the seed wells providing the labeled inputs. We determine the champion model in the Assessment step of the SEMMA process. This model was used to predict all the tops for each well in the same cluster.

From these tests, the accuracy of this process can be relatively high even when the amount of training data is relatively low. Additionally, the amount of time taken to execute the trained machine-learning model on all the target logs, can be relatively fast (e.g., less than a day) compared to traditional interpretation across several months.

The case study implemented by SAS and Energective is demonstrated in more detail in a presentation delivered at SAS Global Forum 2021. We discuss in depth the thought-process behind the ML techniques and the resulting business value in the context of the geoscience.

About SAS in IoT
SAS empowers organizations to create and sustain business value from diverse IoT data and initiatives, whether that data is at the edge, in the cloud, or anywhere in between. Our robust, scalable, and open edge-to-cloud analytics platform delivers deep expertise in advanced analytics – including AI, machine learning, deep learning, and streaming analytics – to help customers reduce risk and boost business performance. Learn more about our industry and technology solutions at

1. Lutgert, J., Greiss, R.M., & Hughes, C. (2013). De-Risking Shale Plays and Assessing By- Passed Pay Potential in The Netherlands. In SPE/EAGE European Unconventional Resources Conference and Exhibition
2. Ph.D. theses directed by John W Tukey – Princeton University, 1940-1990, in “The practice of data analysis,” Princeton, NJ, 1995 (Princeton, NJ, 1997), 16-18.

By: Keith R. Holdaway, O&G Advisory Industry Consultant, SAS