杰瑞科技汇

Python GDAL如何高效读写GeoTIFF?

Of course! Working with GeoTIFFs in Python is a common task, and the GDAL (Geospatial Data Abstraction Library) is the standard, most powerful tool for the job. The rasterio library is also extremely popular and built on top of GDAL, offering a more "Pythonic" API.

Python GDAL如何高效读写GeoTIFF?-图1
(图片来源网络,侵删)

This guide will cover:

  1. Installation of GDAL and rasterio.
  2. Core Concepts of a GeoTIFF.
  3. Reading a GeoTIFF with rasterio.
  4. Inspecting its metadata.
  5. Writing a new GeoTIFF.
  6. A complete, practical example (clipping a raster).

Installation

You need to install both a GDAL binding for Python and the underlying GDAL library itself.

Option A: The Easiest Way (Using Conda)

Conda handles the system-level GDAL library and the Python package together, which avoids most common installation issues.

# Create a new environment (optional but recommended)
conda create -n geo_env python=3.10
conda activate geo_env
# Install GDAL and rasterio
conda install -c conda-forge gdal rasterio

Option B: Using Pip (Can be Tricky)

Using pip can sometimes lead to errors because it can't find the system-level GDAL library. It's often best to use a pre-compiled wheel from Christoph Gohlke's site if you're on Windows.

Python GDAL如何高效读写GeoTIFF?-图2
(图片来源网络,侵删)
# For Linux/macOS, you might need to install GDAL system-wide first
# On Ubuntu/Debian:
# sudo apt-get update
# sudo apt-get install gdal-bin libgdal-dev
# On macOS (using Homebrew):
# brew install gdal
# Then install the Python bindings with pip
pip install GDAL==$(gdal-config --version) # This links pip to the system GDAL
pip install rasterio

Core Concepts of a GeoTIFF

Before we code, let's understand the key pieces of information in a GeoTIFF file:

  • Raster Data: The main grid of pixels (e.g., elevation values, RGB color bands).
  • NoData Value: A special value in the raster data that means "no data is present here" (e.g., -9999).
  • Geotransform: An affine transformation that maps pixel coordinates (row, column) to real-world coordinates (x, y). It's a 6-element tuple: (x_origin, pixel_width, x_rotation, y_origin, y_rotation, pixel_height).
    • x_origin, y_origin: The top-left corner coordinates of the top-left pixel.
    • pixel_width, pixel_height: The size of each pixel in map units. pixel_height is usually negative because the origin is at the top-left.
    • x_rotation, y_rotation: These are typically 0, indicating the image is "north-up".
  • Coordinate Reference System (CRS): A system that defines how the 2D/3D coordinates relate to real locations on Earth. Common examples are EPSG:4326 (WGS 84 Latitude/Longitude) and EPSG:3857 (Web Mercator).

Reading a GeoTIFF with rasterio

rasterio is the recommended library for most tasks. It provides a clean, context-manager-based interface.

The Context Manager (with rasterio.open(...) as src:)

This is the standard way to open a file. It ensures the file is automatically closed for you, even if errors occur.

import rasterio
import rasterio.plot
# Path to your GeoTIFF file
tif_path = 'path/to/your/image.tif'
# Open the file in 'read' mode ('r')
with rasterio.open(tif_path, 'r') as src:
    # The 'with' block ensures the file is closed automatically
    print(f"Driver: {src.driver}")
    print(f"Size: {src.width} x {src.height}")
    print(f"Number of bands: {src.count}")
    print(f"Coordinate System: {src.crs}")
    print(f"Transform (Geotransform):\n{src.transform}")

Inspecting a GeoTIFF

Let's explore the data and metadata in more detail.

Python GDAL如何高效读写GeoTIFF?-图3
(图片来源网络,侵删)
import rasterio
import numpy as np
tif_path = 'path/to/your/image.tif'
with rasterio.open(tif_path, 'r') as src:
    # --- Read the entire raster data into a NumPy array ---
    # .read() returns a 3D array: (bands, rows, columns)
    # If it's a single-band image (like elevation), shape will be (1, rows, cols)
    data = src.read()
    print(f"Shape of the data array: {data.shape}")
    # --- Read a specific band ---
    # Band numbers are 1-indexed in GeoTIFFs
    band1_data = src.read(1)
    print(f"Shape of band 1: {band1_data.shape}")
    # --- Get the NoData value ---
    # This can be different for each band, but is often the same
    nodata_value = src.nodata
    print(f"NoData Value: {nodata_value}")
    # --- Get metadata as a dictionary ---
    meta = src.meta
    print("\n--- Metadata ---")
    for key, value in meta.items():
        print(f"{key}: {value}")
    # --- Get bounds of the raster ---
    bounds = src.bounds
    print("\n--- Bounds ---")
    print(bounds)
    # --- Access profile (similar to meta) ---
    profile = src.profile
    print("\n--- Profile ---")
    print(profile)

Writing a New GeoTIFF

Writing is just as straightforward. You need to provide the profile information from the source (or a new one) to the write function.

import numpy as np
import rasterio
from rasterio.transform import from_origin
# Let's create some sample data
# A simple 10x10 grid with values from 0 to 99
new_data = np.arange(100).reshape(10, 10)
# Define the output file path
output_path = 'output_image.tif'
# Define the new georeferencing information
# Top-left corner coordinates
ul_x = 500000
ul_y = 6000000
# Pixel size (width and height)
pixel_size = 10.0
# Coordinate Reference System (CRS) - e.g., UTM Zone 10N
crs = 'EPSG:32610'
# Calculate the affine transformation matrix
# It's (x_origin, pixel_width, 0, y_origin, 0, -pixel_height)
transform = from_origin(ul_x, ul_y, pixel_size, pixel_size)
# Get the profile from a source file or create a new one
# For this example, we'll create a new profile
profile = {
    'driver': 'GTiff',
    'dtype': 'uint16',      # Data type of the raster
    'nodata': None,         # NoData value
    'width': new_data.shape[1],
    'height': new_data.shape[0],
    'count': 1,             # Number of bands
    'crs': crs,
    'transform': transform
}
# Write the new file
with rasterio.open(output_path, 'w', **profile) as dst:
    # The first argument is the data, the second is the band number
    dst.write(new_data, 1)
print(f"Successfully created new GeoTIFF: {output_path}")

Practical Example: Clipping a Raster

A very common task is to clip a large raster to the extent of a smaller area defined by a shapefile.

Prerequisites:

  1. A GeoTIFF file (raster.tif).
  2. A shapefile (area_of_interest.shp) that defines the clipping area.
import rasterio
from rasterio.mask import mask
import geopandas as gpd
import matplotlib.pyplot as plt
# --- Inputs ---
raster_path = 'path/to/your/raster.tif'
shapefile_path = 'path/to/your/area_of_interest.shp'
output_path = 'clipped_raster.tif'
# --- 1. Open the raster and read its CRS ---
with rasterio.open(raster_path, 'r') as src:
    # The CRS of the raster
    raster_crs = src.crs
    print(f"Raster CRS: {raster_crs}")
# --- 2. Load the shapefile and ensure it has the same CRS ---
gdf = gpd.read_file(shapefile_path)
# IMPORTANT: The shapefile must be in the same CRS as the raster
if gdf.crs != raster_crs:
    print(f"Reprojecting shapefile from {gdf.crs} to {raster_crs}")
    gdf = gdf.to_crs(raster_crs)
# --- 3. Clip the raster using the shapefile's geometry ---
# The geometry needs to be a list
shapes = gdf.geometry.values.tolist()
print("Clipping raster...")
with rasterio.open(raster_path, 'r') as src:
    # The mask function returns the clipped data and the new transform
    clipped_data, clipped_transform = mask(src, shapes, crop=True, nodata=src.nodata)
# The mask function can add an extra dimension, so we squeeze it if it's a single band
if clipped_data.shape[0] == 1:
    clipped_data = clipped_data.squeeze()
# --- 4. Update the metadata and write the new clipped file ---
with rasterio.open(raster_path, 'r') as src:
    # Copy the metadata, but update width, height, transform, and nodata
    profile = src.profile.copy()
    profile.update({
        "driver": "GTiff",
        "height": clipped_data.shape[0],
        "width": clipped_data.shape[1],
        "transform": clipped_transform,
        "nodata": src.nodata
    })
print("Writing clipped raster...")
with rasterio.open(output_path, 'w', **profile) as dst:
    dst.write(clipped_data.astype(profile['dtype']), 1)
print(f"Successfully created clipped raster: {output_path}")
# --- Optional: Visualize the result ---
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
with rasterio.open(raster_path, 'r') as src:
    rasterio.plot.show(src, ax=ax[0], title='Original Raster')
    ax[0].plot(*gdf.geometry.total_bounds[[0,2]].T, 'r--') # Plot bounds on original
with rasterio.open(output_path, 'r') as src:
    rasterio.plot.show(src, ax=ax[1], title='Clipped Raster')
plt.tight_layout()
plt.show()

Key Takeaways

  • Use rasterio: It's the modern, user-friendly Python wrapper for GDAL.
  • Always use with rasterio.open(...): It handles file closing automatically.
  • Understand the Profile/Meta: This dictionary is the key to reading and writing GeoTIFFs correctly. It holds all the georeferencing information.
  • CRS is Crucial: Always check that your data layers (raster and vector) are in the same Coordinate Reference System before performing operations like masking or reprojecting.
分享:
扫描分享到社交APP
上一篇
下一篇