Lab3c - Image Classification#

Unsupervised classification#

Unsupervised classification is a machine learning technique that groups similar pixels into classes based on their spectral characteristics. Earth Engine provides the ee.Clusterer class for performing unsupervised classification. The supported clustering algorithms include: wekaKmeans, wekaXmeans, wekaCascadeKMeans, wekaCobweb, and wekaVQ. The general workflow for performing unsupervised classification in Earth Engine is as follows:

  1. Prepare a multiband image for classification.

  2. Generate training samples from the image.

  3. Initialize a clusterer and adjust the parameters as needed.

  4. Train the clusterer using the training samples.

  5. Apply the clusterer to the image.

  6. Label the clusters as needed.

  7. Export the classified image.

The following example demonstrates how to perform unsupervised classification on a Landsat 9 image. First, filter the Landsat 9 image collection by a region of interest and date range. Select the least cloudy image and select the seven spectral bands. Note that a scaling factor of 0.0000275 and a bias of -0.2 are applied to the image to convert the DN values to reflectance values:

# %pip install geemap geedim
import ee
import geemap
c:\Users\C00553090\AppData\Local\miniconda3\envs\hypercoast\lib\site-packages\pandas\core\computation\expressions.py:21: UserWarning: Pandas requires version '2.8.4' or newer of 'numexpr' (version '2.7.3' currently installed).
  from pandas.core.computation.check import NUMEXPR_INSTALLED
ee.Authenticate()
ee.Initialize()
m = geemap.Map()

point = ee.Geometry.Point([-90.7664, 29.9411])

image = (
    ee.ImageCollection('LANDSAT/LC09/C02/T1_L2')
    .filterBounds(point)
    .filterDate('2022-01-01', '2022-12-31')
    .sort('CLOUD_COVER')
    .first()
    .select('SR_B[1-7]')
)

region = image.geometry()
# Convert from integer to float
image = image.multiply(0.0000275).add(-0.2).set(image.toDictionary())
vis_params = {'min': 0, 'max': 0.3, 'bands': ['SR_B4', 'SR_B3', 'SR_B2']}

m.center_object(region, 8)
m.add_layer(image, vis_params, "Landsat-9")
m

Next, use the get_info() function to inspect the image metadata and inspect it in a tree structure:

geemap.get_info(image)

You can also get an image property using the get() method. For example, to get the image acquisition date:

image.get('DATE_ACQUIRED').getInfo()
'2022-04-07'

To check the image cloud cover:

image.get('CLOUD_COVER').getInfo()
0.01

With the image ready, you can now generate training samples. You can specify a region where the training samples will be generated. There are several ways to create a region for generating training samples:

  • Draw a shape (e.g., rectangle) on the map and the use region = m.user_roi

  • Define a geometry, such as region = ee.Geometry.Rectangle([xmin, ymin, xmax, ymax])

  • Create a buffer around a point, such as region = ee.Geometry.Point([x, y]).buffer(v)

  • If you don’t define a region, it will use the image footprint by default

The following code generates 5000 training samples from the image and add them to the m. Note that the region parameter is not specified, so the image footprint will be used as the region:

training = image.sample(
    **{
        # "region": region,
        'scale': 30,
        'numPixels': 5000,#'numPixels': 5000: This limits the sample to 5,000 pixels from the image. It means that 5,000 random pixels will be extracted from the image.
        'seed': 0, # 'seed': 0: This provides a seed for randomization. The same seed ensures that the sampling process is reproducible; the same pixels will be selected every time you run this code.
        'geometries': True,  # Set this to False to ignore geometries
    }
)

m.add_layer(training, {}, 'Training samples') # 'Training samples' is the name of the layer that will be displayed on the map.
m

To inspect the attribute table of training data, use the ee_to_df() function on the first few features of the collection:

geemap.ee_to_df(training.limit(20))
SR_B1 SR_B2 SR_B3 SR_B4 SR_B5 SR_B6 SR_B7
0 0.007653 0.018487 0.021650 0.016288 0.020742 0.033282 0.032512
1 0.037518 0.040625 0.070765 0.051818 0.329485 0.193003 0.106598
2 0.038040 0.042907 0.073873 0.057015 0.335865 0.187833 0.097577
3 0.037105 0.043182 0.062845 0.052725 0.047308 0.036307 0.030230
4 -0.007142 0.004242 0.017388 0.022172 0.039772 0.032018 0.027095
5 -0.001395 0.000667 0.001850 -0.004805 -0.001257 0.011475 0.012712
6 -0.003980 0.009632 0.015600 0.010293 0.016095 0.026600 0.026627
7 0.002812 0.009990 0.014198 0.010650 0.014967 0.026353 0.025968
8 -0.007747 0.002125 0.012987 0.004875 0.006717 0.018982 0.019285
9 0.040350 0.045410 0.053990 0.049177 0.045300 0.042742 0.037765
10 -0.000790 0.007433 0.014775 0.010045 0.014802 0.028745 0.028250
11 -0.012395 -0.000845 0.014060 0.008038 0.011200 0.022503 0.022970
12 -0.014485 -0.003705 0.013097 0.008065 0.010430 0.022503 0.022283
13 0.068565 0.081958 0.107368 0.122327 0.097413 0.044503 0.039250
14 0.016122 0.022723 0.056960 0.033585 0.343042 0.142128 0.066420
15 0.007763 0.016232 0.021127 0.017470 0.022393 0.031358 0.030670
16 0.033145 0.037710 0.048655 0.048545 0.033557 0.026572 0.023190
17 -0.001422 0.006662 0.009467 0.004655 0.009715 0.023300 0.023273
18 0.059682 0.071122 0.086357 0.096725 0.085065 0.046840 0.040267
19 0.028415 0.038837 0.067493 0.064990 0.241980 0.199575 0.119358

The training data is ready. Next, you need to initialize a clusterer and train it using the training data. The following code initializes a wekaKmeans clusterer and trains it using the training data by specifying the number of clusters (e.g., 5):

# This specifies the number of clusters you want to create.
# In this case, 5 clusters will be formed. These clusters represent different groups or classes of pixels that share similar characteristics based on their spectral values (or other features).
n_clusters = 5

#ee.Clusterer is a clustering module in Google Earth Engine that provides access to clustering algorithms.
# wekaKMeans() is the specific clustering algorithm being used, which is the K-Means algorithm from the WEKA library (a machine learning toolkit).
# K-Means is an unsupervised learning algorithm used to partition the data into K clusters. The goal is to group pixels based on their similarity, with the number of clusters defined by n_clusters (in this case, 5).
clusterer = ee.Clusterer.wekaKMeans(n_clusters).train(training)

With the clusterer trained, you can now apply it to the image. The following code applies the clusterer to the image and adds the classified image to the map with a random color palette:

result = image.cluster(clusterer)
m.add_layer(result.randomVisualizer(), {}, 'clusters')
m

Note the value and color of each cluster are randomly assigned. Use the Inspector tool to inspect the value of each cluster and label them as needed. Define a legend dictionary with pairs of cluster labels and colors, which can be used to create a legend for the classified image:

legend_dict = {
    'Developed': '#466b9f',
    'Forest': '#ab0000',
    'Crop': '#d99282',
    'Wetlands': '#1c5f2c',
    'Open Water': '#ab6c28'

}

palette = list(legend_dict.values())

m.add_layer(
    result, {'min': 0, 'max': 4, 'palette': palette}, 'Labelled clusters'
)
m.add_legend(title='Land Cover Type',legend_dict=legend_dict , position='bottomright')
m

The unsupervised classification result is shown in ch06_unsupervised_classification.

Finally, you can export the classified image to your computer. Specify the image region, scale, and output file path as needed:

geemap.download_ee_image(image, filename='unsupervised.tif', region=region, scale=90)

Supervised classification#

Supervised classification is a machine learning technique that uses labeled training data to classify an image. The training data is used to train a classifier, which is then applied to the image to generate a classified image. Earth Engine provides the ee.Classifier class for performing supervised classification. The supported supervised classification algorithms include: Classification and Regression Trees (CART), Support Vector Machine (SVM), Random Forest, Naive Bayes, and Gradient Tree Boost. The general workflow for supervised classification is as follows:

  1. Prepare an image for classification.

  2. Collect training data. Each training sample should have a class label and a set of properties storing numeric values for the predictors.

  3. Initialize a classifier and set its parameters as needed.

  4. Train the classifier using the training data.

  5. Apply the classifier to the image.

  6. Perform accuracy assessment.

  7. Export the classified image.

In this section, you will learn how to perform supervised classification using the CART algorithm. You can easily adapt the code to other supervised classification algorithms, such as Support Vector Machine (SVM) and Random Forest. We will use labeled training data from the USGS National Land Cover Database (NLCD) dataset and train a CART classifier using the training data. The trained classifier will then be applied to the Landsat-9 image to generate a classified image.

First, filter the Landsat 8 image collection to select a cloud-free image acquired in 2019 for your region of interest:

# This initializes an interactive map object using the geemap library
m = geemap.Map()
# This creates a point geometry object in Earth Engine, representing a location at coordinates
point = ee.Geometry.Point([-90, 30])

image = (
    ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
    .filterBounds(point)
    .filterDate('2019-01-01', '2020-01-01')
    .sort('CLOUD_COVER')
    .first()
    .select('SR_B[1-7]')
)

image = image.multiply(0.0000275).add(-0.2).set(image.toDictionary())
vis_params = {'min': 0, 'max': 0.3, 'bands': ['SR_B4', 'SR_B3', 'SR_B2']}

m.center_object(point, 8)
m.add_layer(image, vis_params, "Landsat-8")
m

Use the get_info() function to check the image properties:

geemap.get_info(image)

To get a specific image property, use the get() method with the property name as the argument. For example, to retrieve the image acquisition date:

image.get('DATE_ACQUIRED').getInfo()
'2019-03-06'

To check the cloud cover of the image:

image.get('CLOUD_COVER').getInfo()
0.02

Next, create a training dataset from the NLCD dataset, which is a 30-m resolution dataset covering the conterminous United States. The NLCD dataset contains 21 land cover classes. A detailed description of each NLCD land cover type can be found at https://bit.ly/3EyvacV. The following code filters the NLCD dataset to select the land cover image of interest and clips the dataset to the region of interest (the footprint of the selected Landsat image in the previous step):

# This line loads the NLCD 2019 dataset from the USGS (United States Geological Survey) as an Earth Engine Image object. 
# The ee.Image function creates an image object from the specified data source, in this case, the 2019 NLCD.
nlcd = ee.Image('USGS/NLCD_RELEASES/2019_REL/NLCD/2019')
# This selects the land cover classification band from the NLCD image.
landcover = nlcd.select('landcover').clip(image.geometry())
m.add_layer(landcover, {}, 'NLCD Landcover')
m

With the land cover image ready, we can now sample the image to collect training data with land cover labels. Similar to the unsupervised classification introduced above, you can specify a region of interest, scale, and number of points to sample. The following code samples 5000 points from the land cover image:

# his code samples points from the landcover image within a specified region. 
points = landcover.sample(
    **{
        'region': image.geometry(),
        'scale': 30,
        'numPixels': 5000,
        # seed: A seed value for randomization, ensuring reproducibility of the sample. A seed of 0 is used.
        'seed': 0,
        # When set to True, the sampled points are returned as geometries, meaning they include geographic coordinates for each sampled pixel.
        'geometries': True,
    }
)

m.add_layer(points, {}, 'training', False)

Note that the resulting number of training samples may be less than the specified number of points. This is because the sampling algorithm will discard pixels with no data values. To check the number of training samples:

print(points.size().getInfo())
5000

Revise the number of points to sample in the previous step if needed.

Next, we will add the spectral bands of the Landsat image to the training data. Note that the training data created from the previous step already contains the land cover labels (i.e., the landcover property). The following code adds the seven spectral bands of the Landsat image to the training data:

# This defines a list of band names that correspond to different spectral bands from the satellite image.
bands = ['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7']
label = 'landcover'
# This part of the code creates a new set of features by sampling the selected bands from the image at the locations defined in the points collection. 

#.sampleRegions(): This function samples the pixel values from the image at specific geographical locations. 
# These locations are defined by the points collection that was created earlier. 
features = image.select(bands).sampleRegions(
    **{'collection': points, 
       # 'properties': [label]: This ensures that the land cover property ('landcover') is included in the sampled data. 
       #  It adds this label as an attribute to each feature, so the feature collection has both reflectance and the landcover labels.
       'properties': [label], 
       'scale': 30
       }
)

Dislay the attribute table of the training data using the ee_to_df() function:

geemap.ee_to_df(features.limit(5))
SR_B1 SR_B2 SR_B3 SR_B4 SR_B5 SR_B6 SR_B7 landcover
0 0.014445 0.016343 0.028470 0.024867 0.192370 0.110365 0.055227 42
1 0.023657 0.029433 0.050003 0.038205 0.228697 0.108825 0.062295 95
2 0.021320 0.027507 0.057235 0.044255 0.185495 0.134235 0.087072 90
3 0.009632 0.015627 0.037105 0.037600 0.172268 0.101923 0.057922 81
4 0.015022 0.032073 0.064330 0.072607 0.026600 0.007433 0.008065 11

The training dataset is ready. You can now train a classifier using the training data. The following code initializes a CART classifier and trains it using the training data:

Summary of the Code: You define a set of parameters (params) that include the training data (features), the property to be predicted (label), and the input features (bands). You then create a CART decision tree classifier using Earth Engine’s smileCart algorithm, and train it using the specified parameters. The classifier will learn to map the input spectral bands (from satellite imagery) to the land cover classes based on the provided training data.

The **params syntax unpacks the dictionary so that the features, classProperty, and inputProperties are passed directly as arguments to the training function.

features provides the training data, classProperty defines the class to be predicted, and inputProperties are the features used for prediction.

# Definine a dictionary params that contains the necessary inputs for training the classifier.
params = {

    'features': features,
    # This represents the class or category that you want the classifier to predict
    'classProperty': label,
    # This specifies the input features (predictor variables) that the classifier will use to make predictions.
    'inputProperties': bands,

}

# This argument specifies the maximum number of nodes (branches) in the decision tree. 
# When set to None, the classifier is allowed to grow the tree fully, meaning there is no constraint on the size of the tree unless other stopping criteria are met.
classifier = ee.Classifier.smileCart(maxNodes=None).train(**params)

The features parameter specifies the training data. The classProperty parameter specifies the property name of the training data that contains the class labels. The inputProperties parameter specifies the property names of the training data that contain the predictor values.

All Earth Engine classifiers have a train() function to train the classifier using the training data. The CART classifier has a maxNodes parameter to specify the maximum number of nodes in the tree. The default value is None, which means that the tree will be grown until all leaves are pure or until all leaves contain less than 5 training samples.

Since the classifier has been trained, you can now apply it to the Landsat image to generate a classified image. Make sure you use the same spectral bands as the training data. The following code applies the trained classifier to the selected Landsat image and adds the classified image with a random color palette to the map:

classified = image.select(bands).classify(classifier).rename('landcover')
m.add_layer(classified.randomVisualizer(), {}, 'Classified')
m

To compare the classified image with the referenced NLCD land cover image, it is better to use the same color palette. To set the color palette of an Earth Engine image with a predefined palette, set the bandname_class_values and bandname_class_palette properties of the image. For example, the NLCD land cover has the landcover_class_values and landcover_class_palette properties. When the land cover band is added to the map, the color palette will be automatically applied so that users don’t have to specify the color palette manually. To check the color palette of the NLCD land cover image:

geemap.get_info(nlcd)

We can use the same approach to set the color palette of the classified image. Note that in the previous step, we already renamed the classified image band to landcover. Therefore, we can use the landcover_class_values and landcover_class_palette properties to set the color palette of the classified image. The following code sets the color palette of the classified image:

class_values = nlcd.get('landcover_class_values')
class_palette = nlcd.get('landcover_class_palette')
classified = classified.set({
    'landcover_class_values': class_values,
    'landcover_class_palette': class_palette
})

The classified image should now have the same color palette as the NLCD land cover image. Add the classified image and associated legend to the map:

m.add_layer(classified, {}, 'Land cover') # Empty dictionary {} is where visualization parameters (such as color mapping, opacity, etc.) would normally be specified.
m.add_legend(title="Land cover type", builtin_legend='NLCD')
m

Use the layer control widget to change the opacity of the classified image and compare it visually with the NLCD land cover image. You might need to use the full-screen mode as the legend may block the view of layer control widget.

Finally, you can export the classified image to your computer:

geemap.download_ee_image(
    landcover,
    filename='supervised.tif',
    region=image.geometry(),
    scale=30
    )

Accuracy assessment#

After performing image classification, you may want to assess the accuracy of the classification. Earth Engine provides several functions for assessing the accuracy of a classification. In this section, we will classify a Sentinel-2 image using random forest and assess the accuracy of the classification.

First, filter the Sentinel-2 image collection and select an image for your region of interest:

m = geemap.Map()
point = ee.Geometry.Point([-90.0, 30.0])

img = (
    ee.ImageCollection('COPERNICUS/S2_SR')
    .filterBounds(point)
    .filterDate('2020-01-01', '2021-01-01')
    .sort('CLOUDY_PIXEL_PERCENTAGE')
    .first()
    .select('B.*')
)

vis_params = {'min': 100, 'max': 3500, 'bands': ['B11',  'B8',  'B3']}

m.center_object(point, 9)
m.add_layer(img, vis_params, "Sentinel-2")
m

The ESA 10-m WorldCover can be used to create labeled training data. First, we need to remap the land cover class values to a 0-based sequential series so that we can create a confusion matrix later:

lc = ee.Image('ESA/WorldCover/v100/2020')
classValues = [10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 100]
remapValues = ee.List.sequence(0, 10)
label = 'lc'
lc = lc.remap(classValues, remapValues).rename(label).toByte()

Next, add the ESA land cover as a band of the Sentinel-2 reflectance image and sample 100 pixels at a 10m scale from each land cover class within the region of interest:

sample = img.addBands(lc).stratifiedSample(**{
  'numPoints': 100,
  'classBand': label,
  'region': img.geometry(),
  'scale': 10,
  'geometries': True
})

Add a random value field to the sample and use it to approximately split 80% of the features into a training set and 20% into a validation set:

sample = sample.randomColumn()
trainingSample = sample.filter('random <= 0.8')
validationSample = sample.filter('random > 0.8')

With the training data ready, we can train a random forest classifier using the training data.

The following code trains a random forest classifier with 10 trees:

trainedClassifier = ee.Classifier.smileRandomForest(numberOfTrees=10).train(**{
  'features': trainingSample,
  'classProperty': label,
  'inputProperties': img.bandNames()
})

To get information about the trained classifier:

If the validation accuracy is acceptable, the trained classifier can then be applied to the entire image:

imgClassified = img.classify(trainedClassifier)

Lastly, add the resulting data layers (e.g., Sentinel-2 image, classified image, training and validation samples) to the m. Use the layer control widget the change the opacity of the classified image and compare it to the ESA WorldCover.

classVis = {
  'min': 0,
  'max': 10,
  'palette': ['006400' ,'ffbb22', 'ffff4c', 'f096ff', 'fa0000', 'b4b4b4',
            'f0f0f0', '0064c8', '0096a0', '00cf75', 'fae6a0']
}
m.add_layer(lc, classVis, 'ESA Land Cover', False)
m.add_layer(imgClassified, classVis, 'Classified')
#m.add_layer(trainingSample, {'color': 'black'}, 'Training sample')
#m.add_layer(validationSample, {'color': 'white'}, 'Validation sample')
#m.add_legend(title='Land Cover Type', builtin_legend='ESA_WorldCover')
m.center_object(img)
m