Tree Crown Segmentation and Analysis in Remote Sensing Imagery with PyTorch
DeepTrees is a end-to-end library for tree crown semantic and instance segmentation, as well as analysis in remote sensing imagery. It provides a modular and flexible framework based on PyTorch for training, active-learning and deploying deep learning models for tree crown semantic and instance segmentation. The library is designed to be easy to use and extendable, with a focus on reproducibility and scalability. It includes a variety of pre-trained models, datasets, and tree allometrical metrics to help you understand tree crown dynamics.
Read more about this work and find tutorials on: https://deeptrees.de. The DeepTrees project is funded by the Helmholtz Centre for Environmental Research -- UFZ, in collaboration with Helmholtz AI.
To install the package, clone the repository and install the dependencies.
git clone https://codebase.helmholtz.cloud/taimur.khan/DeepTrees.git
cd DeepTrees
pip install -r requirements.txt
or from Gitlab registry:
pip install deeptrees --index-url https://codebase.helmholtz.cloud/api/v4/projects/13888/packages/pypi/simple
or from PyPI.
pip install deeptrees
Note: DeepTrees uses python libaries that depend on GDAL. Make sure to have GDAL>=3.9.2 installed on your system, e.g. via conda:
conda install -c conda-forge gdal==3.9.2
.
You can view the documentation page on: https://treecrowndelineation-ai-consultants-dkrz-35d16e8c9ecc31028ba160.pages.hzdr.de/
This library is documented using Sphinx. To build the documentation, run the following command.
sphinx-apidoc -o docs/source deeptrees
cd docs
make html
This will create the documentation in the docs/build
directory. Open the index.html
file in your browser to view the documentation.
This software requires Hydra for configuration management. The configuration yaml files are stored in the config
directory.
This software uses Hydra for configuration management. The configuration files are stored in the config
directory.
The confirguration schema can be found in the config/schema.yaml
file.
A list of Hydra configurations can be found in: /docs/prediction_config.md
DeepTrees provides a set of pretrained models for tree crown segmentation. Currently the following models are available:
Author | Description | Model Weights |
---|---|---|
Freudenberg et al., 2022 | Tree Crown Delineation model based on U-Net with ResNet18 backbone. Trained on 89 images sampled randomly within Germany. Set of 5 model weights from 5-fold cross validation. | k=0, 1, 2, 3 (default), 4 |
Note: We are in the process of adding more pretrained models.
Note: Like all AI systems, these pretrained models can make mistakes. Validate predictions, especially in critical applications. Be aware that performance may degrade significantly on data that differs from the training set (e.g., different seasons, regions, or image qualities)
Download the pretrained models from the links:
from deeptrees.pretrained import freudenberg2022
freudenberg2022(
filename="name_your_file", # name of the file to save the model
k=0, # number of k-fold cross validation
return_dict=True # returns the weight pytorch model weights dictionary
)
DeepTrees also provides a lablled DOP dataset with DOP20cm rasters and corresponding polygon labels as .shp
files. The dataset is available for download:
from deeptrees.datasets.halleDOP20 import load_tiles, load_labels
load_tiles(zip_filename="path/to/tiles.zip") #give the path to where you want to save the tiles
load_labels(zip_filename="path/to/labels.zip") #give the path to where you want to save the labels
Note: We are in the process of adding more datasets and updating the current datasets.
Predict tree crown polygons for a list of images. The configuration file in config_path
controls the pretrained model, output paths, and postprocessing options.
from deeptrees import predict
predict(image_path=["list of image_paths"], config_path = "config_path")
DeepTrees calculates pixel-wise entropy maps for each input image. The entropy maps can be used to select the most informative tiles for training. The entropy maps are stored in the entropy
directory.
To train the model, you need to have the labeled tiles in the tiles
and labels
directories. The unlabeled tiles go into pool_tiles
. Your polygon labels need to be in ESRI shapefile format.
Adapt your own config file based on the defaults in train_halle.yaml
as needed. For inspiration for a derived config file for finetuning, check finetune_halle.yaml
.
Run the script like this:
python scripts/train.py # this is the default config that trains from scratch
python scripts/train.py --config-name=finetune_halle # finetune with pretrained model
python scripts/train.py --config-name=yourconfig # with your own config
To re-generate the ground truth for training, make sure to pass the label directory in data.ground_truth_labels
. To turn it off, pass data.ground_truth_labels=null
.
You can overwrite individual parameters on the command line, e.g.
python scripts/train.py trainer.fast_dev_run=True
To resume training from a checkpoint, take care to pass the hydra arguments in quotes to avoid the shell intercepting the string (pretrained model contains =
):
python scripts/train.py 'model.pretrained_model="Unet-resnet18_epochs=209_lr=0.0001_width=224_bs=32_divby=255_custom_color_augs_k=0_jitted.pt"'
Before you embark onSync the folder tiles
and labels
with the labeled tiles. The unlabeled tiles go into pool_tiles
.
|-- tiles
| |-- tile_0_0.tif
| |-- tile_0_1.tif
| |-- ...
|-- labels
| |-- label_tile_0_0.shp
| |-- label_tile_0_1.shp
| |-- ...
|-- pool_tiles
| |-- tile_4_7.tif
| |-- tile_4_8.tif
| |-- ...
Create the new empty directories
|-- masks
|-- outlines
|-- dist_trafo
We use the following classes for training:
0 = tree 1 = cluster of trees 2 = unsure 3 = dead trees (haven’t added yet)
However, you can adjust classes as needed in your own training workflow.
By default, MLFlow logs are created during training.
Run the inference script with the corresponding config file. Adjust as needed.
python scripts/test.py --config-name=inference_halle
This repository has auto semantic versionining enabled. To create new releases, we need to merge into the default main
branch.
Semantic Versionining, or SemVer, is a versioning standard for software (SemVer website). Given a version number MAJOR.MINOR.PATCH, increment the:
See the SemVer rules and all possible commit prefixes in the .releaserc.json file.
Prefix | Explanation | Example |
---|---|---|
feat | A new feature was implemented as part of the commit, so the Minor part of the version will be increased once this is merged to the main branch | feat: model training updated |
fix | A bug was fixed, so the Patch part of the version will be increased once this is merged to the main branch | fix: fix a bug that causes the user to not be properly informed when a jobfinishes |
The implementation is based on. https://mobiuscode.dev/posts/Automatic-Semantic-Versioning-for-GitLab-Projects/
This repository is licensed under the MIT License. For more information, see the LICENSE.md file.
@article{khan2025torchtrees,
author = {Taimur Khan and Caroline Arnold and Harsh Grover},
title = {DeepTrees: Tree Crown Segmentation and Analysis in Remote Sensing Imagery with PyTorch},
journal = {arXiv},
year = {2025},
archivePrefix = {arXiv},
eprint = {XXXXX.YYYYY},
primaryClass = {cs.CV}
}
Democratizing AI