Felix Bartusch et al.
Defining the future scientific data flow for multi-disciplinary research data
Digital data and computerized workflows are at the core of almost every domain in science. Data is not only the base for scientific publication but can become equally important by itself. The discovery of new insights from huge amount of (unstructured) data for completely unrelated fields already have made big data a valuable asset for scientific findings. The value of the ever-increasing amounts of data for subsequent use and the requirements of funding agencies generate the need for formalized Research Data Management (RDM). Modern digital workflows involve more than one system to generate, compute or visualize ever-larger data sets. Thus, the operators of the large scale federated research infrastructures at the involved HPC computing centers in Baden-Württemberg face the challenge of providing suitable storage services. Such a Storage-for-Science (SFS) represents an essential building block for the anticipated state-wide data federation. In addition to the integration of the various pre-existing infrastructures, the long-term identification of data sets, their owners, and the definition of necessary metadata becomes a challenge. The implementation and provisioning of a RDM system needs to be organized together with the scientific communities and has to fit well into the growing Research Data Repositories landscape.