1 Data Science for Petroleum Production Engineering (part 1)

1.1 The Old vs the New

In the last century, the production engineer built the well models one by one and analyzed the results also one by one. With the ubiquity of the personal computer, desktops and laptops, an unimaginable computational power has been put in our hands. So, why takes so long to update well models? Why still are we doing it one-by-one?

Why -if nowadays we transfer data at gigabits per second-, are we still using tools of the era of bits per second?

1.2 Do worksheets, spreadsheets makes you dumb?

The spreadsheet was invented in the 80’s and was a great invention. The beauty of it is that you can produce results right away. The cells model has been so nice to us.

We are very thankful to Lotus 1-2-3 and Microsoft Excel for that.

But let us remember that it was the 80’s –there was not internet, no tablets, no smartphones, no laptops. Data moved at 2400 bps (or even less), and real-time data acquisition was constrained to a handful of fields with luxury mainframes. Today, well data comes in gigabytes per minute and we know it is not humanly possible to tackle all the information to put it at good use.

The answer of this century is to that is start using automation, artificial intelligence, statistics, machine learning but first of all, data science. These are the keywords of the future. The future of the petroleum engineer.

The thing is: there are not smart wells or smart fields, without smart tools and smart people and without modern computational statistics

1.3 Why does it take so long to build optimization models?

One of the things that always intrigued the production managers what is it that it takes so long in building well models and their corresponding network models? One of the causes is that the production engineers have to enter the data manually; have to double and triple check the data being entered; calibrate and match the model; and – the most important of all-, entering the data without a context, well by well.

What does that mean? It means we work on a well model in isolation without having the tools to validate our well data in the context of the field data or of other well data (neighbors). Then, when we have all our 77 or 177 wells ready, we received fresh well test data that contradicts earlier well tests! Go back to rework the models.

Part of this conundrum is not the production engineer’s fault; it is just the way that all well optimization software works or has been built: to attack a well one at a time and disconnected from the test data source or from other wells.

Some companies have been using for a while Python scripts to help to address the problem of single well modeling in a radical different way: by building well models with multiwell scanning, quality control, validation and analysis, before and after the model has been built. With these techniques the engineer minimizes the manual data entering, validation, calibration and data qualification. This by itself represents a major advantage over the traditional way of doing well and network modeling and optimization. The additional power and discovery of oil gains from production optimization comes from applying basic statistics to:

the well input data: quasi-static (PVT, completion, deviation survey) and dynamic (well tests, artificial lift data)
nodal analysis oresults f the unconnected well
results or calculations of the network simulation runs

1.4 The cheapest oil you can find?

The plots that you see in this post are just few samples of the work of well and network modeling on a field. Field names and well numbers have been anonymized to make the exercise neutral.

None of this findings is earth shattering new; it has been around for years but the digital tools that are making high-tech companies successful are taking a little time to take off in the petroleum production engineering field.

Well, first, data science is hard; statistics is harder. But enormously gratifying when you find hidden oil.

Oil found with statistics and data science is the cheapest oil you can get.

1.5 Data structures are our friends

One of the keys in data science is to get accustomed to analyze well production data by using dynamic tables. There is no best tool for analyzing production engineering data than dataframes or data tables in Python or R. The pandas tables (yes, it is written in lowercase) are magnificent objects to manipulate, arrange, organize, calculate, clean, select, filter, analyze and plotting production data.

Engineering libraries for the Production Technologist can be created to include scripts that make use of production datasets with the help of packages such as numpy, scipy, matplotlib, PyQt4, pyqtgraph, bokeh, traits, etc., in the Python environment. With the computational libraries the engineer is able to analyze the well models data faster and in a more reliably way since all these tasks are done by digital algorithms, using scripting languages for scientists and engineers. And, best of all, it is open source.

Your are curious? This is how a simple statement looks like in Python:

From the data we can tell that the biggest producers are wells producing by gas lift with an average of 944 blpd (barrels of liquid per day), a maximum of 2421 blpd in a total of 98 wells. The lowest producer is at 100 blpd.

1.6 Remember: Garbage In, Garbage Out

In next posts I will be showing simple Python and R scripts for petroleum production engineering.

What we see in the next plot is a plotting technique for quality control of the well test data to be used in the well models. Before we run the models it is recommended that you run some scripts to perform some quality control on the data such as finding outliers or data points that do not belong there, or even addressing missing data.

You will also observe in some examples how convenient is scanning stand-alone well models with Python scripts to detect outliers in well parameters such as: reservoir temperature, water salinity, gas specific gravity, API, injection gas data, GOR, pressures, etc.

By refining these technique we will see later that we can even do quaility control on the results of the nodal analysis phase to calibrate and match the models to the well test data.

1.7 Digital Notebooks are a fit for Engineers

The Python notebooks look ideal for the application of data science to well, network modeling and production optimization. There are many ways of running Python but I find the notebook to be the best method to transmit knowledge and share solutions with colleagues.

The Python notebooks -lately renamed as Jupyter -, are widely known in the engineering community. You shouldn’t have any trouble finding assistance online. The same applies to Python. My favorite place in the web to find help is stackoverflow.com. There are hundreds of engineering and scientific libraries around that will make your production engineering job more efficient, accurate, profitable and fun. And of course, faster.

The Python notebook opens in a web browser but if you want to make long and advanced scripts for petroleum engineering, there are plenty of good development applications or IDEs out there: Spyder, PyCharm or Eclipse, just to mention a few.

1.8 `Python` or `R`

You have seen in other posts that I fell in love with R. But Python has not be forgotten. Both cover different needs and have their own space of application. Python is an scripting language that is easy to learn, very much similar to the human natural language, it has thousands of libraries that cover all sthe spectrum of industries and applied sciences. R focus is more on statistics. Let me rephrased that: “advanced statistics”. Both languages deal with data in their own style. While Python is a general purpose language, R strength is in resolving problem using vectorial abstractions. Both are beautiful, both are high level, and both fill different user needs. They are not competing but sharing the same information super highway.