Björn Selent et al.
Management of Research Data in Computational Fluid Dynamics and Thermodynamics
The performance increase of the available resources in the HPC area offers the possibility to investigate fundamental questions of fluid mechanics with numerical tools in high temporal and spatial resolution. In particular, turbulent flows, which account for the largest part of flow processes in nature and technology and do not exhibit a closed analytical solution, can be investigated with increasing precision. Nowadays, one trillion data points are stored and processed per simulation. It is not clear from the outset which data is relevant for understanding the physical processes. Therefore, hundreds to thousands of simulations are carried out in the course of a research project in order to investigate the influence of individual parameters. Similarly molecular simulations underwent a huge increase in algorithmic and technical performance making it now simple to generate large amounts of data. The pure amount of data can be reduced by suitable data compression algorithms. However, it remains an important and challenging task to manage these simulations in a structured way to ensure reproducibility, retrievability and clarity. Equally important is the subsequent question of how the methodology, process and results of the research project can be secured in the long term and shared if necessary.
So far, publication of the data is not anchored in the professional culture and is not easy to achieve due to the technical circumstances. Archiving data for more than 3-4 years seems neither possible nor sensible. Based on this initial situation, members of the IAG and ITT, in cooperation with the infrastructure facilities (UB, TIK, HLRS) of the University of Stuttgart, develop and test a working process in which, immediately after data generation, metadata is automatically extracted from log and input files, supplemented by rarely changing information on authors and projects, and stored in the DaRUS data repository. The repository, based on the open-source software Dataverse, is used for the local administration of the data. The easy searchability of the descriptive data helps to reuse the existing data and to document the research process. Last but not least, the data already described can be published or meaningfully archived without much additional effort. For the future, we are also striving for clear criteria for the selection, quality control and retention period of the data and mechanisms for the automated linking of datasets.