A comprehensive framework for automated data processing and manual postprocessing, including quality control and flagging. Modular and extensible design allows to create customizable pipelines and monitor their performance throughout the data lifecycle.
All components of the infrastructure are provided as Docker images for easy deployment and reproducibility.
In addition to the previously mentioned services, the setup of DataFlow requires the following services:
A version of DataFlow optimized for integration into existing infrastructures is available in the standalone branch of the repository.
The setup can be customized via the .env file there, which allows you to specify which services are already running and which should be deployed during deployment.
DataHub Initiative of the Research Field Earth and Environment