DataFlow

Description

Features

Build customizable data processing pipelines by extending predefined modules for each workflow step (e.g., data ingestion, plausibility checks, transformations)
Link pipelines to designated directories to automatically process newly incoming data
Automatically execute the corresponding pipeline as soon as new data is detected
Monitor pipeline performance, status, and logs in real time
Access and inspect imported data stored in InfluxDB, with integrated tools for quality control, data mutation and flagging
Manage the entire process from pipeline configuration to data quality assessment via an UI

Technology Stack

Frontend: React, Next.js, TailwindCSS, shadcdn
Main API: Python, FastAPI
WebSocket API: Python, FastAPI
Migration API: Python, FastAPI
Workflow Trigger: Python

All components of the infrastructure are provided as Docker images for easy deployment and reproducibility.

Required Technical Infrastructure

In addition to the previously mentioned services, the setup of DataFlow requires the following services:

InfluxDB
Grafana
PostgreSQL
AWS or MinIO
NGINX (recommended)

A version of DataFlow optimized for integration into existing infrastructures is available in the standalone branch of the repository.

The setup can be customized via the .env file there, which allows you to specify which services are already running and which should be deployed during deployment.