The topics of the Data Life Cycle Labs correspond to research fields of the Helmholtz Association. In the initial phase the DLCL “Earth and Environment” will focus on the sub-topic “Atmosphere and Climate” mainly for two reasons:
- Huge volumes of observational as well as modeling data are available and increase dramatically, requiring collaborative effort for data management and data curation.
- The Atmospheric and climate community already have started to establish data infrastructures supporting data life cycle management in national and international collaborations, thus providing a good starting point
The scientists of several research groups study the behavior of the stratosphere, troposphere, biosphere and their interactions. Many remote sensing instruments on different satellites collect data in the order of hundred terabytes per year. Raw data of such instruments are converted in multiple steps resulting in more than thousand small files per day.
The processing of such a large amount of small files is limited by the access time rather than the computing time. Due to the irreplaceability and preciousness of the data an archival of the data for several decades is required.
Climate model data from HPC simulations is analyzed worldwide by a growing number of scientists. The amount of new data from international modeling activities (e.g. CMIP5 and CORDEX) ranges up to 10 PB per year, which are archived and disseminated in globally distributed data federations.
In recent years data applications changed from community specific to interdisciplinary applications. Data federations were founded and the scientific research addresses more and more coupled and interdisciplinary questions like adaptation and mitigation.
After annotation and quality control, about 25% of the climate model raw data are stored in long-term data archives. These model data will not be further reduced because often detailed use cases are unknown in advance and therefore it is not predictable which outcome will be necessary for further investigations.
The data will be made publicly available and subsequent access is very unpredictable. Worldwide available archives for climate data must have ample capacities and must assure long term availability as well as related data curation services.
Data management and Analysis for DLCL Earth and Environment
Data discovery methods, efficient data access, optimization of processing workflows, security and quality management are the research topics to be considered for the optimization of the data life cycle of this DLCL. Fast and secure global data access and efficient data transfers are essential requirements for interdisciplinary usage of the archived data in international federations.