Model the Earth
with Python
A hands-on tutorial for developers who know Python but have never thought about rock layers. We'll build geological intuition first, then translate it into GemPy code.
What is GemPy?
GemPy is an open-source Python library for building 3-D geological models from sparse input data — a handful of point observations and orientation measurements — using implicit surface interpolation backed by a Universal Co-kriging engine.
Think of it as: "Given a few GPS points where geologists measured rock layers, reconstruct the full 3-D shape of every layer underground."
What can you do with it?
Geology Crash Course
Before writing a line of GemPy code you need to understand what you're modelling. This section is a minimal geology primer written specifically for programmers.
2.1 · Rock types (the data types of the Earth)
Treat rock types like data types. They have different origins and properties:
| Rock Type | Origin | Example | GemPy relevance |
|---|---|---|---|
| Sedimentary | Particles deposited in layers by water/wind | Limestone, Shale, Sandstone | Most common target — forms distinct, mappable layers |
| Igneous | Cooled magma | Granite, Basalt | Can be modelled as intrusions cutting other layers |
| Metamorphic | Pre-existing rock transformed by heat/pressure | Marble, Schist | Complex shapes; often the basement in a model |
2.2 · Stratigraphy — the concept GemPy is built around
Stratigraphy is the science of rock layers. The key law is the Law of Superposition: older layers are below newer ones (unless tectonics have flipped things).
git log.
Stratigraphic order in GemPy is exactly this layering priority.
2.3 · Strike & Dip — the orientation of a surface
Rock layers are rarely flat. Their orientation is described by two angles:
| Measurement | What it means | Range | Analogy |
|---|---|---|---|
| Dip angle | How steeply a layer tilts from horizontal | 0° (flat) – 90° (vertical) | Slope angle of a ramp |
| Dip direction | The compass direction the layer tilts toward | 0°–360° (azimuth) | Direction the ramp faces |
| Strike | The direction of a horizontal line on the tilted surface (⊥ dip direction) | 0°–360° | The ridge of the ramp |
In GemPy these orientations are called orientations (or foliations) and are provided as input data points — each one is a measurement location plus a normal vector to the surface.
2.4 · Faults — the breaking points
A fault is a crack where rock masses have slid past each other. They offset layers and must be modelled separately in GemPy because they cut through the regular layering.
| Fault type | Motion | Typical cause |
|---|---|---|
| Normal fault | Hanging wall drops (extension) | Tectonic stretching |
| Reverse fault | Hanging wall rises (compression) | Tectonic collision |
| Strike-slip fault | Lateral sliding | Transform boundaries (e.g. San Andreas) |
2.5 · Folds — when layers bend instead of break
Where faults are layers breaking, folds are layers bending — usually from the same compressional forces, but without enough stress to fracture the rock. Two basic shapes:
| Fold type | Shape | Oldest rock is… |
|---|---|---|
| Anticline | Arches upward (∩) | At the core / centre |
| Syncline | Sags downward (∪) | At the outer edges |
2.6 · Unconformities — gaps in the rock record
An unconformity is a buried erosion surface: older rocks were tilted, uplifted, and eroded flat, then younger sediments were deposited directly on top — skipping a chunk of geological time.
This is exactly why GemPy groups surfaces into separate series (Section 3) — each series represents one continuous depositional/deformation episode, and an unconformity is simply the boundary between two series.
2.7 · Igneous Intrusions — rock that cuts across everything
An intrusion forms when molten rock pushes into existing layers and solidifies underground — unlike sedimentary layers, an intrusion can cut diagonally or even vertically through multiple older layers at once.
| Intrusion shape | Geometry | Example |
|---|---|---|
| Dike | Vertical/steep sheet cutting across layers | Volcanic dike swarm |
| Sill | Horizontal sheet squeezed between existing layers | Basalt sill |
| Pluton/Stock | Large irregular blob | Granite batholith |
Intrusion_Series → Fault_Series
→ regular stratigraphic series (intrusions usually cut last, faults can offset everything below).
2.8 · How series interact: Erosion vs. Onlap
When two series meet, GemPy needs to know how the younger one relates to the older one. This directly controls how the final model looks at that boundary:
| Relation | Meaning | Visual effect |
|---|---|---|
| Erode (default) | The younger series truncates the older one — like the unconformity in 2.6 | Older layers appear "cut off" beneath the younger series |
| Onlap | The younger series only fills in gaps/topography of the older one, without eroding it | Older layers stay intact; younger sediment "drapes" around them |
# By default every series erodes the one below it. # To make a series onlap instead (drape rather than cut): geo_model.set_bottom_relation('Younger_Series', 'Onlap')
2.9 · Coordinate conventions — a common beginner trap
In GemPy (and most geosciences software), Z is elevation, not depth: higher Z = higher up = closer to the surface. This trips up programmers used to image/screen coordinates where Y increases downward.
| Axis | Convention | Typical range |
|---|---|---|
| X | East–West (Easting) | Project-specific, often metres or UTM coordinates |
| Y | North–South (Northing) | Project-specific |
| Z | Elevation — higher value = higher up | Often relative to sea level; can be negative below sea level |
elevation = surface_z - depth.
GemPy's Data Model
GemPy organises everything into a single GeoModel object. Conceptually:
The two types of input data
| Data type | GemPy term | What it represents | Minimum needed |
|---|---|---|---|
| Surface points | surface_points |
3-D coordinates (X, Y, Z) on a geological surface (e.g., top of limestone layer) | ≥ 3 per surface |
| Orientations | orientations |
Point location + pole vector describing which way the surface tilts there | ≥ 1 per series |
Series & Surfaces (the hierarchy)
Surfaces are grouped into Series. All surfaces in the same series are interpolated together (they share the same geological deformation history). A special Basement is always the bottom-most unit.
Installation
GemPy works best in a dedicated conda environment due to its compiled dependencies:
# Option A: conda (recommended) conda create -n gempy_env python=3.10 conda activate gempy_env pip install gempy # Option B: pip (plain venv) python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install gempy # Optional but useful for visualisation pip install pyvista panel matplotlib
import gempy; print(gempy.__version__)
and compare with the official migration guide if needed.
# Verify installation import gempy as gp print(gp.__version__) # Expected: 2.x.x
Surfaces & Series
Every GemPy workflow starts by creating a model and declaring the geological units (surfaces) and their grouping (series).
import gempy as gp import numpy as np # ── Step 1: Create a GeoModel ────────────────────────────── # The extent defines the bounding box of your model (x_min, x_max, # y_min, y_max, z_min, z_max) in metres (or any consistent unit) geo_model = gp.create_model('My_First_Model') gp.init_data( geo_model, extent=[0, 2000, # X: 2 km wide 0, 2000, # Y: 2 km deep 0, 1500], # Z: 1.5 km tall resolution=[50, 50, 50] # 3-D grid cells for interpolation ) # ── Step 2: Define surfaces (geological units) ───────────── # These are the names of layers from TOP to BOTTOM # (youngest to oldest, following superposition law) gp.surfaces.add_surfaces( geo_model, ['Topsoil', 'Limestone', 'Shale', 'Sandstone'] ) # ── Step 3: Assign surfaces to series ────────────────────── # A 'series' groups surfaces that were deposited together # (same tectonic event / same time period) gp.map_series_to_surfaces( geo_model, { 'Sedimentary_Series': ('Topsoil', 'Limestone', 'Shale'), 'Basement_Series': ('Sandstone',) } ) print(geo_model.surfaces)
add_surfaces?Input Data
Now we add the geological observations. Think of each row as a field measurement — a geologist visited a location, identified a rock boundary, and recorded its position and tilt.
6.1 · Surface Points
# Surface points: WHERE does a geological surface pass through? # Each row: X, Y, Z, surface_name # Z is elevation (positive = above a datum, usually sea level) gp.add_surface_points( geo_model, X = [500, 1000, 1500, # Limestone top: 3 points 500, 1000, 1500, # Shale top: 3 points 500, 1000, 1500], # Sandstone top: 3 points Y = [1000] * 9, # All at Y=1000 (mid-model cross-section) Z = [1100, 1050, 1000, # Limestone dips slightly east 900, 870, 840, # Shale 600, 600, 600], # Sandstone: flat surface = [ 'Limestone', 'Limestone', 'Limestone', 'Shale', 'Shale', 'Shale', 'Sandstone', 'Sandstone', 'Sandstone' ] )
6.2 · Orientations
# Orientations: HOW is the surface tilted at a point? # G_x, G_y, G_z = components of the surface NORMAL vector # A flat horizontal surface has normal (0, 0, 1) — pointing straight up # A surface dipping east has a normal tilted toward west gp.add_orientations( geo_model, X = [1000, 1000], Y = [1000, 1000], Z = [1050, 870], surface = ['Limestone', 'Shale'], pole_vector = [ (0.087, 0, 0.996), # Limestone: 5° east dip (sin5°≈0.087, cos5°≈0.996) (0.087, 0, 0.996), # Shale: same dip ] ) # ── Tip: convert dip angle → pole vector ─────────────────── # import numpy as np # dip_deg = 15 # degrees # pole = (np.sin(np.radians(dip_deg)), 0, np.cos(np.radians(dip_deg)))
Computing the Model
Once input data is set, GemPy interpolates a full 3-D scalar field and extracts surfaces from it. Conceptually it's similar to Marching Cubes on a kriging scalar field.
# ── Step 1: Set interpolation options ────────────────────── gp.set_interpolator( geo_model, theano_optimizer='fast_run', # 'fast_compile' for first run verbose=[] ) # ── Step 2: Compute ──────────────────────────────────────── # This runs the kriging interpolation on the input data. # Returns a Solution object containing scalar fields & surfaces sol = gp.compute_model(geo_model) print(sol) # Solution: lith_block — integer label per voxel (which layer is it?) # scalar_field_matrix — continuous scalar field values # vertices/edges — the extracted surface meshes # ── Access the lithology block ────────────────────────────── # lith_block[i] = layer index at voxel i lith_block = sol.lith_block print(lith_block.shape) # (50*50*50,) → flatten of 50³ voxels # Reshape to 3-D grid lith_3d = lith_block.reshape(geo_model.grid.resolution)
resolution for detail, but computation time scales as O(n³).
Understanding the scalar field
Visualisation
GemPy has built-in 2-D (matplotlib) and 3-D (PyVista) plotting tools.
# ── 2-D Cross-section (matplotlib) ───────────────────────── gp.plot_2d( geo_model, direction='y', # cross-section perpendicular to Y axis section_names=['y'], show_data=True, # show surface points & orientations show_lith=True # colour-fill by lithology ) # ── 3-D interactive render (PyVista) ─────────────────────── p = gp.plot_3d( geo_model, show_surfaces=True, show_data=True, show_topography=False, image=False # set True for a static PNG ) # In Jupyter: this opens an interactive PyVista window # ── Export to VTK for external viewers ───────────────────── gp.plot_3d(geo_model, image=True) # renders PNG inline in notebook # Export surfaces as mesh files for surf in geo_model.solutions.meshes: surf.mesh.save(f"output_{surf.surface}.vtk")
Modelling Faults
Faults require a special fault series with StructuralFrame flags.
# ── 1. Add a fault surface ───────────────────────────────── gp.surfaces.add_surfaces(geo_model, ['MainFault']) # ── 2. Create a Fault series ─────────────────────────────── # Fault series MUST come BEFORE the geological series they cut gp.map_series_to_surfaces( geo_model, { 'Fault_Series': ('MainFault',), # <-- first 'Sedimentary_Series': ('Limestone', 'Shale'), 'Basement_Series': ('Sandstone',) }, remove_unused_series=True ) # ── 3. Mark the series as a fault ───────────────────────── geo_model.set_is_fault(['Fault_Series']) # ── 4. Add fault observations ────────────────────────────── gp.add_surface_points( geo_model, X=[1000, 1000], Y=[500, 1500], Z=[1200, 800], surface=['MainFault', 'MainFault'] ) gp.add_orientations( geo_model, X=[1000], Y=[1000], Z=[1000], surface=['MainFault'], pole_vector=[(0.7, 0, 0.7)] # 45° dip ) sol = gp.compute_model(geo_model)
Adding Topography
Real models have an uneven surface — mountains, valleys. GemPy can load a DEM (Digital Elevation Model).
# ── Option A: synthetic random topography ───────────────── geo_model.set_topography(source='random', fd=1.9, d_z=200) # fd = fractal dimension (roughness): 1.5–2.5 # d_z = elevation range in model units # ── Option B: from a GeoTIFF DEM ────────────────────────── geo_model.set_topography( source='gdal', filepath='path/to/dem.tif' ) # ── Option C: from a numpy array ────────────────────────── # topo_array shape: (res_x, res_y) values = elevation (Z) topo_array = np.load('topo.npy') geo_model.set_topography(source='numpy', array=topo_array) # After setting topography, re-compute sol = gp.compute_model(geo_model) # Visualise with topo overlay gp.plot_3d(geo_model, show_topography=True)
GemPy Cheat Sheet
import gempy as gp import numpy as np # ── 1. CREATE ────────────────────────────────────────────── geo_model = gp.create_model('ModelName') gp.init_data(geo_model, extent=[x_min, x_max, y_min, y_max, z_min, z_max], resolution=[nx, ny, nz]) # ── 2. DEFINE STRUCTURE ──────────────────────────────────── gp.surfaces.add_surfaces(geo_model, ['S1', 'S2', 'S3']) # youngest→oldest gp.map_series_to_surfaces(geo_model, {'Series1': ('S1', 'S2'), 'Base': ('S3',)}) # ── 3. ADD DATA ──────────────────────────────────────────── gp.add_surface_points(geo_model, X=[...], Y=[...], Z=[...], surface=[...]) gp.add_orientations(geo_model, X=[...], Y=[...], Z=[...], surface=[...], pole_vector=[...]) # ── 4. (OPTIONAL) FAULTS ────────────────────────────────── geo_model.set_is_fault(['FaultSeries']) # ── 5. (OPTIONAL) TOPOGRAPHY ────────────────────────────── geo_model.set_topography(source='random') # ── 6. COMPUTE ───────────────────────────────────────────── gp.set_interpolator(geo_model) sol = gp.compute_model(geo_model) # ── 7. VISUALISE ─────────────────────────────────────────── gp.plot_2d(geo_model, direction='y', show_lith=True, show_data=True) gp.plot_3d(geo_model, show_surfaces=True, show_topography=True) # ── 8. INSPECT ───────────────────────────────────────────── sol.lith_block # voxel lithology IDs sol.scalar_field_matrix # raw scalar field geo_model.solutions.meshes # PyVista mesh objects
Common methods reference
| Method | What it does | Key params |
|---|---|---|
gp.create_model() | Instantiate a new GeoModel | project_name |
gp.init_data() | Set bounding box & resolution | extent, resolution |
gp.surfaces.add_surfaces() | Register named surfaces | list of surface names |
gp.map_series_to_surfaces() | Group surfaces into series | dict of series→surfaces |
gp.add_surface_points() | Add XYZ observations of surfaces | X,Y,Z,surface |
gp.add_orientations() | Add tilt observations | X,Y,Z,surface,pole_vector |
geo_model.set_is_fault() | Flag series as fault | list of series names |
geo_model.set_topography() | Load surface DEM | source, filepath |
gp.set_interpolator() | Compile the Theano backend | theano_optimizer |
gp.compute_model() | Run the kriging interpolation | geo_model |
gp.plot_2d() | Matplotlib cross-section | direction, show_lith |
gp.plot_3d() | PyVista 3-D render | show_surfaces, show_topography |
github.com/cgre-aachen/gempy/examples —
try the Moureze and Perth Basin notebooks.
For probabilistic modelling, explore GemPy + PyMC3 integration.
Example Gallery — Beginner to Expert
A curated progression of real-world-style GemPy examples. Each one builds on the concepts from previous sections and adds new complexity. Work through them top to bottom.
🟢 Beginner
import gempy as gp geo_model = gp.create_model('Beginner_FlatLayers') gp.init_data(geo_model, extent=[0,1000,0,1000,0,800], resolution=[30,30,30]) gp.surfaces.add_surfaces(geo_model, ['Sand', 'Clay']) gp.map_series_to_surfaces(geo_model, {'Strat': ('Sand', 'Clay')}) gp.add_surface_points(geo_model, X=[100,900], Y=[500,500], Z=[400,400], surface=['Sand','Sand']) gp.add_orientations(geo_model, X=[500], Y=[500], Z=[400], surface=['Sand'], pole_vector=[(0,0,1)]) # flat → normal points straight up gp.set_interpolator(geo_model) sol = gp.compute_model(geo_model) gp.plot_2d(geo_model, show_lith=True)
pole_vector. Practice converting a dip angle to a normal vector.
import numpy as np dip_deg = 12 pole = (np.sin(np.radians(dip_deg)), 0, np.cos(np.radians(dip_deg))) gp.add_orientations(geo_model, X=[500], Y=[500], Z=[450], surface=['Sand'], pole_vector=[pole]) # Re-run compute_model + plot_2d to see the tilt
🟡 Intermediate
geo_model = gp.create_model('Intermediate_Unconformity') gp.init_data(geo_model, extent=[0,2000,0,2000,0,1000], resolution=[50,50,50]) gp.surfaces.add_surfaces(geo_model, ['Recent_Sand', 'Old_Limestone', 'Old_Shale', 'Basement']) # Two series = two independent deformation events separated by an erosion surface gp.map_series_to_surfaces(geo_model, { 'Younger_Series': ('Recent_Sand',), 'Older_Series': ('Old_Limestone', 'Old_Shale'), 'Basement_Series': ('Basement',) }) # ... add surface_points + orientations for each surface ... # Tip: give 'Older_Series' a steeper dip than 'Younger_Series' # to visually show the angular unconformity
gp.surfaces.add_surfaces(geo_model, ['NormalFault']) gp.map_series_to_surfaces(geo_model, { 'Fault_Series': ('NormalFault',), 'Strat_Series': ('Limestone', 'Shale', 'Sandstone') }, remove_unused_series=True) geo_model.set_is_fault(['Fault_Series']) # ... add fault surface_points + orientations, then compute
🟠 Advanced
geo_model.set_topography(source='gdal', filepath='region_dem.tif') sol = gp.compute_model(geo_model) # Get the geological map (which unit outcrops at the surface) geo_map = sol.geological_map gp.plot_2d(geo_model, show_topography=True, show_lith=True) gp.plot_3d(geo_model, show_topography=True, show_surfaces=True)
set_fault_relation calls.
gp.surfaces.add_surfaces(geo_model, ['Fault_A', 'Fault_B']) gp.map_series_to_surfaces(geo_model, { 'FaultA_Series': ('Fault_A',), # older fault — cut first 'FaultB_Series': ('Fault_B',), # younger fault — cuts Fault_A too 'Strat_Series': ('Limestone', 'Shale') }) geo_model.set_is_fault(['FaultA_Series', 'FaultB_Series']) # Define that Fault_B offsets Fault_A (fault-fault relation matrix) geo_model.set_fault_relation( np.array([[0,1,1], [0,0,1], [0,0,0]]) )
🔴 Expert
import pymc as pm import gempy as gp with pm.Model() as model: # Prior: true Z of a borehole pick is normally distributed # around the measured value (accounting for survey error) z_uncertain = pm.Normal('limestone_top_z', mu=1050, sigma=15) def run_gempy(z_val): geo_model.modify_surface_points( 0, Z=z_val) # update point 0's Z sol = gp.compute_model(geo_model) return sol.lith_block # Wrap as a PyTensor Op for gradient-free sampling (or use # gempy's built-in pymc backend in v3 for full HMC support) likelihood = pm.Potential( 'observed_thickness', custom_log_likelihood(z_uncertain, observed_well_data) ) trace = pm.sample(1000, tune=500, chains=4) # Result: a distribution of plausible geological models, # not just one "best guess" — quantifies subsurface uncertainty
sol = gp.compute_model(geo_model) lith_3d = sol.lith_block.reshape(geo_model.grid.resolution) # Map each lithology ID to a physical property perm_lookup = {1: 50e-15, 2: 5e-15, 3: 200e-15} # m² permeability_field = np.vectorize(perm_lookup.get)(lith_3d) # Export as VTK structured grid for a reservoir simulator import pyvista as pv grid = pv.UniformGrid() grid.dimensions = np.array(permeability_field.shape) + 1 grid.cell_data['permeability'] = permeability_field.flatten(order='F') grid.save('reservoir_perm.vtk') # Now consumable by simulators like MRST, OPM, or custom solvers