Publications
Paper “Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems”
tom Wörden, Henrik, Florian Spreckelsen, Stefan Luther, Ulrich Parlitz, and Alexander Schlemmer. 2024. “Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems” Data 9, no. 2: 24. https://doi.org/10.3390/data9020024
Abstract:
Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”).
Paper “Agile Research Data Management with Open Source: LinkAhead”
Hornung, D., Spreckelsen, F. & Weiß, T., (2024) “Agile Research Data Management with Open Source: LinkAhead”, ing.grid 1(2). doi: https://doi.org/10.48694/inggrid.3866
Abstract:
Research data management (RDM) in academic scientific environments increasingly enters the focus as an important part of good scientific practice and as a topic with big potentials for saving time and money. Nevertheless, there is a shortage of appropriate tools, which fulfill the specific requirements in scientific research. We identified where the requirements in science deviate from other fields and proposed a list of requirements which RDM software should answer to become a viable option. We analyzed a number of currently available technologies and tool categories for matching these requirements and identified areas where no tools can satisfy researchers’ needs. Finally we assessed the open-source RDMS (research data management system) LinkAhead for compatibility with the proposed features and found that it fulfills the requirements in the area of semantic, flexible data handling in which other tools show weaknesses.
Paper “Guidelines for a Standardized Filesystem Layout for Scientific Data”
Spreckelsen, F.; Rüchardt, B.; Lebert, J.; Luther, S. ; Parlitz, U. & Schlemmer, A. 2020. “Guidelines for a Standardized Filesystem Layout for Scientific Data ” Data, 5(2). DOI: https://doi.org/10.3390/data5020043
Abstract:
Storing scientific data on the filesystem in a meaningful and transparent way is no trivial task. In particular, when the data have to be accessed after their originator has left the lab, the importance of a standardized filesystem layout cannot be underestimated. It is desirable to have a structure that allows for the unique categorization of all kinds of data from experimental results to publications. They have to be accessible to a broad variety of workflows, e.g., via graphical user interface as well as via command line, in order to find widespread acceptance. Furthermore, the
inclusion of already existing data has to be as simple as possible. We propose a three-level layout to organize and store scientific data that incorporates the full chain of scientific data management from
data acquisition to analysis to publications. Metadata are saved in a standardized way and connect original data to analyses and publications as well as to their originators. A simple software tool to
check a file structure for compliance with the proposed structure is presented.
Paper “CaosDB—Research Data Management for Complex, Changing, and Automated Research Workflows”
Fitschen, Timm, Alexander Schlemmer, Daniel Hornung, Henrik tom Wörden, Ulrich Parlitz, and Stefan Luther. 2019. “CaosDB—Research Data Management for Complex, Changing, and Automated Research Workflows” Data 4, no. 2: 83. https://doi.org/10.3390/data4020083
Abstract
We present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data in a FAIR way. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless, it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of its data model, the CaosDB Server and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it.
Note: CaosDB is the original term under which LinkAhead was developed at the Max Planck Institute. It is no longer used today, but still appears in some places.
Recommended Publications
Software “pyJSON Schema Loader and JSON Editor: A tool for file-based metadata management”
Nick Plathe, Markus M. Becker, Steffen Franke,
pyJSON Schema Loader and JSON Editor: A tool for file-based metadata management, SoftwareX, Volume 28, 2024, 101945,
ISSN 2352-7110, https://doi.org/10.1016/j.softx.2024.101945
Abstract
This work introduces the “pyJSON Schema Loader and JSON Editor”, a client-side desktop application for offline and local environments capable of creating, editing and tracking metadata-containing JSON documents. The newly developed tool aims to support the generation and re-use of structured metadata, required for the implementation of research data management and the FAIR data principles in research workflows. pyJSON is written in Python, a modern and flexible programming language. The interface consists of a table-like view tailored to present JSON documents, enriched with information from the corresponding JSON schema. It shall assist in creating and maintaining documents containing metadata by simplifying the process of generation and editing based on a given schema. It is used to document data sets, data collections and devices in a local file structure, intending to support the transition from paper-based documentation to more modern research data management workflows, while sustaining a uniform and standardised structure, without the need to bind users to an often complex and resource demanding database system.
Preprint “FAIR GPT: A virtual consultant for research data management in ChatGPT”
Renat Shigapov, Irene Schumm. 2024. “FAIR GPT: A virtual consultant for research data management in ChatGPT” Digital Libraries. https://doi.org/10.48550/arXiv.2410.07108
Abstract
FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It provides guidance on metadata improvement, dataset organization, and repository selection. To ensure accuracy, FAIR GPT uses external APIs to assess dataset FAIRness, retrieve controlled vocabularies, and recommend repositories, minimizing hallucination and improving precision. It also assists in creating documentation (data and software management plans, README files, and codebooks), and selecting proper licenses. This paper describes its features, applications, and limitations.
Paper “The data must be accessible to all”
Gierasch, Lila M.Davidson, Nicholas O.Burlingame, Alma L. et al. “The data must be accessible to all” Journal of Lipid Research, Volume 61, Issue 4, April 2020. DOI: 10.1194/jlr.E120000699
Abstract
Science relies on data acquisition via well-described, rigorous, reproducible procedures, statistically defensible interpretation of these data, and transparent reporting of the interpretations and conclusions reached based on these data. Each of these steps must be communicated to the broader scientific community in such a way that others can critically evaluate the studies, draw conclusions based on the reported findings, design subsequent experiments, and replicate and extend those observations. The only way this is possible is through full disclosure of data so that they are readily accessible to readers. Journals must assume responsibility for ensuring that those data are made available and that the mechanism to access the data is cited in the same way that previous literature reports are cited. The publications of ASBMB (Journal of Biological Chemistry, Journal of Lipid Research, and Molecular and Cellular Proteomics) are joining a number of sister publications in adopting policy requirements that ensure these goals are met for all of our content. (…)
Paper “Making Research Data Accessible”
Kapiszewski D, Karcher S. Making Research Data Accessible. In: Elman C, Gerring J, Mahoney J, eds. The Production of Knowledge: Enhancing Progress in Social Science. Strategies for Social Inquiry. Cambridge University Press; 2020:197-220. DOI: https://doi.org/10.1017/9781108762519.008
Abstract
Social science research is increasingly moving toward a model of open and accessible data. Accessibility opens possibilities of allowing secondary analysis, enhancing pedagogy, and supporting research transparency. This chapter argues that these benefits will accrue more quickly, and will be more significant and more enduing, if researchers make their data “meaningfully accessible,” that is, when the data can be interpreted and analyzed by scholars far beyond those who generated them. Making data meaningfully accessible requires researchers to prepare data for sharing and to take advantage of a growing range of tools for publishing and preserving data.
Paper “The FAIR Guiding Principles for scientific data management and stewardship”
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). DOI: https://doi.org/10.1038/sdata.2016.18
Abstract
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Paper “Going Beyond Availability: Truly Accessible Research Data”
Walker, W. & Keenan, T., (2015) “Going Beyond Availability: Truly Accessible Research Data”, Journal of Librarianship and Scholarly Communication 3(2), eP1223. doi: https://doi.org/10.7710/2162-3309.1223