Skip to main content

Baker Hughes Challenge: Gas Turbine Data Visualization Challenge

[NEW] Devpost Guidelines#

  1. Create a submission on devpost
  2. All submissions (virtual and in-person) should have video under 5 minutes.
  3. No in-person judging

Challenge Description#

As an energy technology company, at Baker Hughes (BH) we approach to a sustainable energy future by deploying the most efficient and least emissive technologies. To help industry advance on the path to net-zero and a sustainable energy future, one of the strategies to follow is identify, control, and reduce emissions from operations.


  • The main objective is to produce a video report with insights from the data using a set of visualizations with great storytelling that allows you to clearly communicate your findings. For this task, you are free to subset as you choose the:
    • Emissions theme: worst cases, best cases, timeline report, etc.
    • Level of analysis: gas turbine, site, customer, etc.
    • Type of charts: for example -- bar, lines, scatter, pie, map, KPIs, gauge, treemap, etc.
    • Functionality of visualizations: interactive, responsive, etc.

The video should not exceed 5 minutes.

Let's code and take energy forward!๐Ÿ™Œ

Data Description#

In this challenge, we provide a synthetic dataset for the operation of gas turbine engines from different customer sites around the world.

One of the data tags you have is the speed of the compressor, from which you can obtain operating hours, which could be useful in your storytelling.

  • HOURS: operating hours in h. This attribute is the accumulated operation time based on the measurements of the compressor speed. The gas turbine engine is considered to be running when two consecutive measurements of the compressor speed are greater than zero, then this sampling time is added to the operating hours count. The starting value of HOURS for all gas turbine engines is zero. You may assume that the gas turbine engines reach a non-zero speed instantly.



Columns in site_metadata.csv:

  • CUSTOMER_NAME: name of customer.
  • PLANT_NAME: name of site.
  • LATITUDE: in degrees.
  • LONGITUDE: in degrees.
  • ELEVATION: in meters.
  • FUEL_LHV: lower heating value of the fuel in BTU/lb.

Columns in engine_metadata.csv:

  • CUSTOMER_NAME: name of customer.
  • PLANT_NAME: name of site.
  • ENGINE_ID: engine name.
  • FILE_ID: filename with data collected from gas turbine engines.

Columns in files data_#.csv included in

  • DATE: datetime of measurement.
  • CMP_SPEED: compressor speedin RPM.
  • POWER: power output from the Low-Pressure Turbine (LPT) in kW.
  • FUEL_FLOW: fuel flow into the combustor in kg/s.
  • CO2: carbon dioxide estimated emissions in kg/s.

A parameter that is usually calculated for gas turbine engines that might be useful in your analysis is:

  • THRM_EFF: thermal efficiency image


Data is a big world, it can be of several types, from different sources and have specific physical meanings; there is not always a set of defined tasks or steps to follow to retrieve valuable information from it. As professionals dedicated to Data Science, we must develop skills to optimally explore data and decide on what are the important leads to follow.

Creativity and good programming skills are two of the main qualities requried for data scientists during data exploration, and in the selection of the analysis approach to solve a problem. It is also crucial to present and communicate your results to the people responsible of decision making. In this challenge we would like you to demonstrate these abiltiies.

We expect you to explore the given data and tell a compelling story through the visualizations of your preference, clearly stating the problem or questions that your visualizations will help to answer. We will consider:

  • Clarity of the analyzed data and its relevance to the proposed problem/question.
  • Accuracy of the graphical representation used to convey your message.
  • Efficiency of visual effects (use of appropriate shapes, colors and sizes to reresent the analyzed data).

Challenge evaluation score table: image


Most popular Python libraries for visualization:

Improve your skills in visualization and data storytelling:

About Baker Hughes: