This is how I programmatically plot discrete raster data (e.g. a land cover map) using Python. Here’s a Jupyter Notebook demonstrating the plotting steps using land cover data for Taita Taveta County, Kenya as an example. This dataset includes a QGIS style file (QML) which I used to configure the legend. There are probably simpler ways of doing this, so please let me know if you have any suggestions!

The land cover plot of Taita Taveta County, Kenya generated using Python.
The land cover plot of Taita Taveta County, Kenya generated using Python.

Data source

Abera, Temesgen Alemayheu; Vuorinne, Ilja; Munyao, Martha; Pellikka, Petri; Heiskanen, Janne (2021), “Taita Taveta County, Kenya - 2020 Land cover map and reference database”, Mendeley Data, V2, doi: 10.17632/xv24ngy2dz.2.

This data is licensed under a Creative Commons Attribution 4.0 International (CC-BY-4.0) License.

Dependencies

The required Python packages are Pandas, rioxarray, Matplotlib, and Numpy.

Additionally, the following packages are used in the demonstration:

  • JupyterLab for interactive notebooks
  • Requests for downloading data
  • Dask for parallel computing (this isn’t really needed for the raster data used in this example, but it’ll be very useful for larger datasets)

Importing libraries and setting up graphical parameters

Import the following libraries:

import multiprocessing
import platform
import os
from zipfile import ZipFile, BadZipFile
from datetime import datetime, timezone

# Windows
if platform.system() == "Windows":
    import multiprocessing.popen_spawn_win32
# Linux/OSX
else:
    import multiprocessing.popen_spawn_posix

import threading
import xml.etree.ElementTree as ET

import requests
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import numpy as np
import rioxarray as rxr
from dask.distributed import Client, LocalCluster, Lock
from dask.utils import SerializableLock

Configure Matplotlib parameters to change the plot style:

plt.style.use("seaborn-whitegrid")
plt.rcParams["font.family"] = "Source Sans 3"
plt.rcParams["figure.dpi"] = 150
plt.rcParams["axes.grid"] = False
plt.rcParams["text.color"] = "darkslategrey"
plt.rcParams["axes.labelcolor"] = "darkslategrey"
plt.rcParams["xtick.labelcolor"] = "darkslategrey"
plt.rcParams["ytick.labelcolor"] = "darkslategrey"
plt.rcParams["figure.titleweight"] = "semibold"
plt.rcParams["axes.titleweight"] = "semibold"
plt.rcParams["figure.titlesize"] = "13"
plt.rcParams["axes.titlesize"] = "12"
plt.rcParams["axes.labelsize"] = "10"

Defining paths and downloading data

Define data directories:

DATA_DIR = os.path.join("data", "kenya_land_cover")
ZIP_FILE = DATA_DIR + ".zip"

Download data (which is a ZIP archive) using Requests (alternatively, download the data manually to the data folder in the working directory):

URL = (
    "https://prod-dcd-datasets-cache-zipfiles.s3.eu-west-1.amazonaws.com/" +
    "xv24ngy2dz-2.zip"
)
r = requests.get(URL, stream=True)

os.makedirs("data", exist_ok=True)

if r.status_code == 200:
    with open(ZIP_FILE, "wb") as filedl:
        for chunk in r.iter_content(chunk_size=1048676):
            filedl.write(chunk)
    print("Data downloaded:", datetime.now(tz=timezone.utc))
else:
    print("\nStatus code:", r.status_code)

To view the list of files in the ZIP archive:

ZipFile(ZIP_FILE).namelist()

This will output the following:

['xv24ngy2dz-2/Data/Land cover map /Landcover-cMfoO1.tif',
 'xv24ngy2dz-2/Data/Land cover map /ColorCode-ni1SIo.qml',
 'xv24ngy2dz-2/Data/Land cover map /Info-UBC9RP.txt',
 'xv24ngy2dz-2/Data/Reference Database for Accuracy Assessment/File.zip',
 'xv24ngy2dz-2/Data/Reference Database for Classification/File.zip']

From this list, only the TIF and QML files will be used.

Extract the ZIP archive to DATA_DIR:

try:
    z = ZipFile(ZIP_FILE)
    z.extractall(DATA_DIR)
except BadZipFile:
    print("There were issues with the file", ZIP_FILE)

Define paths to the TIF and QML files:

for i in ZipFile(ZIP_FILE).namelist():
    if i.endswith(".tif"):
        raster_file = os.path.join(DATA_DIR, i)
    elif i.endswith(".qml"):
        style_file = os.path.join(DATA_DIR, i)

Reading the raster data

Read the land cover raster data using rioxarray, with Dask for parallel computing1:

with LocalCluster() as cluster, Client(cluster) as client:
    landcover = rxr.open_rasterio(
        raster_file,
        chunks=True,
        cache=False,
        masked=True,
        lock=False,
        # lock=Lock("rio-read", client=client)  # when too many file handles open
    )
    landcover.rio.to_raster(
        os.path.join(DATA_DIR, "dask_multiworker.tif"),
        tiled=True,
        lock=Lock("rio", client=client)
    )

View the data structure:

landcover
Structure of the land cover raster data read by rioxarray.
Structure of the land cover raster data read by rioxarray.

View the resolution, bounds, and CRS of the data:

landcover.rio.resolution()
# (0.00026949458520105445, -0.00026949458520105445)

landcover.rio.bounds()
# (37.583984352, -4.138089356, 39.21927749499999, -2.682818595914306)

landcover.rio.crs
# CRS.from_epsg(4326)

Processing the data

Get the unique value count for the raster data (each value represents a land cover type):

# get unique value count for the raster
uniquevals = pd.DataFrame(np.unique(landcover, return_counts=True)).transpose()

# assign column names
uniquevals.columns = ["value", "count"]

# drop row(s) with NaN
uniquevals.dropna(inplace=True)

# convert value column to string (this is required for merging later)
uniquevals["value"] = uniquevals["value"].astype(int).astype(str)

# view the unique values data frame
uniquevals
  value count
0 1 16621.0
1 2 33046.0
2 3 33286.0
3 4 981030.0
4 5 9219689.0
5 6 7721553.0
6 7 124588.0
7 8 871990.0
8 9 75661.0
9 10 10556.0
10 11 34500.0
11 12 73355.0
12 13 89317.0
13 14 2582.0

Read the QGIS style file (QML) containing the legend entries and extract the colour palette:

tree = ET.parse(style_file)
root = tree.getroot()

# extract colour palette
pal = {}

for palette in root.iter("paletteEntry"):
    pal[palette.attrib["value"]] = palette.attrib

# generate data frame from palette dictionary
legend = pd.DataFrame.from_dict(pal).transpose()
legend = pd.DataFrame(legend)

# drop alpha column
legend.drop(columns="alpha", inplace=True)

# convert value column to string (this is required for merging later)
legend["value"] = legend["value"].astype(str)

# view the palette data frame
legend
  value label color
1 1 Montane forest #6ed277
2 2 Plantation forest #4d7619
3 3 Riverine forest #27f90c
4 4 Thicket #fd8e07
5 5 Shrubland #fac597
6 6 Grassland #aaec85
7 7 Agroforestry #4df3ef
8 8 Cropland #f9ed25
9 9 Sisal #840bfe
10 10 Builtup #833023
11 11 Waterbody #3630ee
12 12 Wetland #4dc4f3
13 13 Baresoil #ee15ca
14 14 Barerock #132e1e

Merge the unique values data frame with the legend and calculate percentages based on the count:

uniquevals = uniquevals.merge(legend, on="value")

# calculate percentage based on count
uniquevals["percentage"] = (
    uniquevals["count"] / uniquevals["count"].sum() * 100
)
uniquevals["percentage"] = uniquevals["percentage"].astype(int)

# sort by count
uniquevals.sort_values("count", ascending=False, inplace=True)

# view the data frame
uniquevals
  value count label color percentage
4 5 9219689.0 Shrubland #fac597 47
5 6 7721553.0 Grassland #aaec85 40
3 4 981030.0 Thicket #fd8e07 5
7 8 871990.0 Cropland #f9ed25 4
6 7 124588.0 Agroforestry #4df3ef 0
12 13 89317.0 Baresoil #ee15ca 0
8 9 75661.0 Sisal #840bfe 0
11 12 73355.0 Wetland #4dc4f3 0
10 11 34500.0 Waterbody #3630ee 0
2 3 33286.0 Riverine forest #27f90c 0
1 2 33046.0 Plantation forest #4d7619 0
0 1 16621.0 Montane forest #6ed277 0
9 10 10556.0 Builtup #833023 0
13 14 2582.0 Barerock #132e1e 0

Generate plots

Using the unique values data frame above, a bar chart the major land cover types can be plotted:

mask = uniquevals["percentage"] > 0
uniquevals_sig = uniquevals[mask]

ax = uniquevals_sig.plot.barh(
    x="label", y="percentage", legend=False, figsize=(9, 5),
    color=uniquevals_sig["color"]
)

ax.bar_label(ax.containers[0], padding=3)
plt.title(
    "Major land cover types for Taita Taveta County, Kenya - 2020" +
    "\n[Data: Abera et al. 2021 (CC-BY-4.0)]"
)
plt.ylabel("")
plt.xlabel("Land cover (%)")
plt.show()
Major land cover types for Taita Taveta County, Kenya.
Major land cover types for Taita Taveta County, Kenya.

The next step is to create colour maps using the legend data. The unique values must be converted back to integers first:

uniquevals["value"] = uniquevals["value"].astype(int)
uniquevals.sort_values("value", inplace=True)

A continuous colour map is created using Matplotlib’s LinearSegmentedColormap2; this ensures each unique value has the right colour assigned based on the legend entries. The unique values are normalised from 0 to 1 using the maximum and minimum value ranges prior to creating the colour map:

colours = list(uniquevals["color"])
nodes = np.array(uniquevals["value"])

# normalisation
nodes = (nodes - min(nodes)) / (max(nodes) - min(nodes))
colours = LinearSegmentedColormap.from_list(
    "LCM", list(zip(nodes, colours))
)

# view the colour map
colours

A discrete colour map is created solely for the legend in the final plot:

col_discrete = ListedColormap(list(uniquevals["color"]))

# view the colour map
col_discrete
Continuous colour map generated using the legend entries. Discrete colour map generated using the legend entries.
Colour maps generated using the legend entries.

Finally, the land cover plot can be created. A dummy plot must be created so that the discrete colour map is used as the legend3:

# create a dummy plot for the discrete colour map as the legend
img = plt.figure(figsize=(15, 15))
img = plt.imshow(np.array([[0, len(uniquevals)]]), cmap=col_discrete)
img.set_visible(False)

# assign the legend's tick labels
ticks = list(np.arange(.5, len(uniquevals) + .5, 1))
cbar = plt.colorbar(ticks=ticks)
cbar.ax.set_yticklabels(list(uniquevals["label"]))

landcover.plot(add_colorbar=False, cmap=colours)

plt.axis("equal")
plt.xlim(landcover.rio.bounds()[0] - .01, landcover.rio.bounds()[2] + .01)
plt.ylim(landcover.rio.bounds()[1] - .01, landcover.rio.bounds()[3] + .01)

plt.title("Taita Taveta County, Kenya - 2020 Land cover map")
plt.text(38.75, -4.4, "Data: © Abera et al. 2021 (CC-BY-4.0)")
plt.xlabel("Longitude")
plt.ylabel("Latitude")

plt.show()

Next steps

Adding other map elements, such as a north arrow and a scale bar.

Footnotes

Leave a comment

Your email address will not be published. Required fields are marked *.

Loading...