Ctrl K

EcoToxFred

EcoToxFred (ETF) is a digital domain expert that enables simple, natural-language access to complex, integrated ecotoxicological data. It combines knowledge graphs, large language models, and agent-based workflows to support intuitive exploration and analysis of environmental risk information.

3
contributors

Cite this software

Description

EcoToxFred (ETF) is a domain-specific research software prototype that enables natural-language interaction with complex ecotoxicological data. Modern environmental research relies on increasingly heterogeneous and interconnected datasets that are difficult to access and interpret, particularly for non-computer scientists. ETF addresses this challenge by introducing a novel interaction paradigm, conversational access to structured scientific knowledge, through integrating a knowledge graph with large language models (LLMs) and an agent-based orchestration framework. This approach provides intuitive, user-friendly access to complex environmental data for researchers, regulators, and stakeholders without requiring expertise in database technologies or programming.

At its core, ETF combines a Neo4j-based knowledge graph (KG) integrating 25 years of curated chemical monitoring data from European surface waters with hazard information for aquatic species. This enables the assessment of risks arising from exposure to single chemicals and chemical mixtures. The system further incorporates a large language model (LLM) to interpret user queries and a LangGraph-based agent architecture that dynamically orchestrates tool use, including graph queries and external knowledge retrieval. Natural language questions are translated into formal queries (e.g., Cypher), enabling the retrieval of relevant data and results as text, structured tables, or interactive geographic maps. ETF thereby bridges the gap between complex data infrastructures and practical scientific use.

ETF is not a generic chatbot, but a domain-specific scientific interface grounded in curated environmental monitoring and ecotoxicological data. It directly supports chemical risk assessment, environmental decision-making, and cross-disciplinary access to knowledge. Methodologically, ETF advances scientific software by grounding LLMs in structured knowledge graphs to improve reliability and interpretability. ETF implements a ReAct-style agent workflow for iterative reasoning and tool orchestration, supporting multimodal outputs and enabling complex analytical queries, such as co-occurrence analysis of chemicals, through automated query generation. The system is compatible with both external and internal LLM infrastructures and has been successfully integrated with the Helmholtz AI service Blablador, enabling deployment within secure institutional environments.

ETF lowers barriers to accessing complex environmental data and supports evidence-based decision-making. It enables researchers from other disciplines to explore eco-toxicological datasets, supports regulators and stakeholders in interpreting monitoring results, and fosters broader engagement with scientific data through intuitive interfaces. The underlying architecture is transferable to other domains that rely on structured knowledge graphs, including bioinformatics, climate science, and health research.

The software is developed as living, actively maintained code, with continuous updates, improvements, and recent commits reflecting ongoing development and refinement. ETF is openly available and designed for reuse, reproducibility, and extension by the scientific community. The availability of a Neo4j container and documented workflows supports reproducible analyses and facilitates independent validation and reuse. Users are encouraged to cite the software and associated publication when using ETF in scientific work.

EcoToxFred has been jointly developed by the Max Planck Institute for Human Cognitive and Brain Sciences and the Helmholtz Centre for Environmental Research – UFZ by Patrick Scheibe and Jana Schor, with Jana Schor as the main contributor.

Further information, source code, and access points:

Preprint: https://www.biorxiv.org/content/10.1101/2025.07.04.663152v1
Project repository: https://github.com/yigbt/EcoToxFred
Web application: https://ecotoxfred.web.app.ufz.de/

Logo of EcoToxFred
Keywords
Programming languages
  • Python 83%
  • Cypher 15%
  • Dockerfile 2%
License
</>Source code
Packages

Participating organisations

Helmholtz Centre for Environmental Research (UFZ)
Max Planck Institute for Human and Cognitive Brain Sciences
Leipzig University

Reference papers

Contributors

JS
Jana Schor
Author/Developer/Maintainer
Helmholtz-Zentrum für Umweltforschung UFZ
PS
Patrick Scheibe
Co-Developer
Max Planck Institute for Human Cognitive and Brain Sciences
TS
Thilo Schmid
Developer, Software Architect
Leipzig University

Helmholtz Program-oriented Funding IV