DataLad

DataLad is a tool for the joint management of code, data, and their relationship, built on top of the version control systems Git & git-annex. It adapts principles of open-source software development & distribution to address challenges of data management, data sharing, & digital provenance capture.

229
mentions
61
contributors

Cite this software

What DataLad can do for you

DataLad is a Python-based tool for the joint management of code, data, and their relationship, built on top of a versatile system for data logistics (git-annex) and the most popular distributed version control system (Git). It adapts principles of open-source software development and distribution to address the technical challenges of data management, data sharing, and digital provenance collection across the life cycle of digital objects.

DataLad aims to make data management as easy as managing code. It streamlines procedures to consume, publish, and update data, for data of any size or type, and to link them as precisely versioned, lightweight dependencies. DataLad helps to make science more reproducible and FAIR. It can capture complete and actionable process provenance of data transformations to enable automatic re-computation. The DataLad project (datalad.org) delivers a completely open, pioneering platform for flexible decentralized research data management (RDM). It features a Python and a command-line interface as well as a dedicated graphical user interface, an extensible architecture, and does not depend on any centralized services but facilitates interoperability with a plurality of existing tools and services. In order to maximize its utility and target audience, DataLad is available for all major operating systems, and can be integrated into established workflows and environments with minimal friction.

Participating organisations

Forschungszentrum Jülich
Dartmouth College
The University of Texas at Austin
University of California, Berkeley
Stanford University
Potsdam Institute for Climate Impact Research
Université Catholique de Louvain
University of Tübingen
Otto-von-Guericke University Magdeburg

Reference papers

Mentions

Testimonials

Datalad. Tracks your data just like git tracks your code.
Ted Satterthwaite https://twitter.com/sattertt/status/1582696291636637696
Doing a PhD equals working your way through many tutorials that can be quite boring. The @datalad handbook, however, is really enjoyable! Keeps my motivation for reproducible research up.
Jasmin Stein https://twitter.com/jsmnStein/status/1523672127765118976
With the upcoming requirement of funding agencies for FAIR data management in Canada, we’ve started helping other neuroimaging centres in Montreal to transition to @datalad. Thank you, Datalad
The Courtois Project on Neuronal Modelling https://twitter.com/CNeuromod/status/1499310196459515906
Yesterday was the 1st time I used @datalad and I feel ashamed for not having looked into it earlier! Very beautiful, sophisticated and absolutely necessary piece of software!
Shreyas Fadnavis https://twitter.com/ShreyasSF/status/1466792779443408905
The @datalad folks are doing God's work!
Maurizio Sicorello https://twitter.com/MLSicorello/status/1434054501221208064
One of my favorite tools! Thank you, @datalad :)
Matteo Visconti di Oleggio Castello https://twitter.com/MatteoVdOC/status/1411417783795998721
@datalad is such a terrific tool for managing the evolution of your research project (code, data and beyond) in a transparent, reproducible and shareable way!
Lennart Wittkuhn https://twitter.com/lnnrtwttkhn/status/1411045710687055876
Likely one of the most impactful data sharing tools in the past few years. Go @datalad !
Tristan Glatard https://twitter.com/TristanGlatard/status/1410584481413672960
This is a fantastic tool for reproducible research which solves several issues and has IMO not enough attention so far.
Konrad Förstner https://twitter.com/konradfoerstner/status/1408053954856951810
Datalad can really help you simplify research data management and provides access to many data resources. like many tools it does have a learning curve to get comfortable. so spend the time using it and spend the time reporting issues.
Satrajit Ghosh https://twitter.com/satra_/status/1329039013672591361
Great tool to help you become the Marie Kondo of data and digital life!
Sofie Valk https://twitter.com/sofievalk/status/1215979216619130880
The @datalad project doesn't receive nearly enough kudos, so here is an official endorsement tweet from yours truly \o/ #fromtartodatalad #justuseit
https://twitter.com/esc___/status/930791615731589120
Datalad is awesome. The ability to easily maintain shared file trees with optional data-files across machines (though just the tip the iceberg of datalad's functionality) makes life SO much better :)
Eshin Jolly https://github.com/datalad/datalad/issues/2118
Creating @datalad datasets and analyzing them with "datalad run"... I'm going mad with power!!!
Samuel Nastase https://twitter.com/samnastase/status/1064635456804003840
I want to talk about one of the best tools we use here at TIES: @datalad [...] It has allowed us to centralize data management in ways that previously have been difficult in academia.
Patrick Anker https://sciences.social/@psanker/109365851470878157

Contributors

Helmholtz Program-oriented Funding IV

Research Field
Research Program
PoF Topic
5 Information
5.1 Engineering Digital Futures: Supercomputing, Data Management and Information Security for Knowledge and Action
5.1.1 Enabling Computational- & Data-Intensive Science and Engineering
5.1.2 Supercomputing & Big Data Infrastructures
5.2 Natural, Artificial and Cognitive Information Processing
5.2.5 Decoding Brain Organization and Dysfunction
  • 5 Information
    • 5.1 Engineering Digital Futures: Supercomputing, Data Management and Information Security for Knowledge and Action
      • 5.1.1 Enabling Computational- & Data-Intensive Science and Engineering
      • 5.1.2 Supercomputing & Big Data Infrastructures
    • 5.2 Natural, Artificial and Cognitive Information Processing
      • 5.2.5 Decoding Brain Organization and Dysfunction

Related software

DataLad Container extension

DA

This DataLad extension package equips DataLad's run/rerun functionality with the ability to transparently execute commands in containerized computational environments. On re-run, DataLad will automatically obtain any required container at the correct version prior execution.

Updated 16 months ago

DataLad NEXT extension

DA

This DataLad extension is a staging area for add-ons, for performance upgrades, and user experience improvements. Unlike other topical extensions, the focus is on functionality with broad applicability.

Updated 15 months ago
10

JTrack

JT

JTrack is a software(s) and mobile application ecosystem for digital biomarkers collection and remote assessment. JTrack is designed to collect health-related information from participants' smartphones. It also has a clinicians and administration dashboard for study, user, and data management.

Updated 1 month ago
2 4