The challenges of research data management, data validation and analysis, and data governance—combined with the technical complexity of building and managing high-volume data infrastructure—make delivering robust, scalable, and high-performance data solutions a daunting task. Add the powerful resources that create mountains of scientific data from computer simulations, experiments, and observations at various facilities, and that daunting task becomes a grand challenge.
The Scalable Data Infrastructure for Science (SDIS) Initiative in the Computing and Computational Sciences Directorate at the Department of Energy’s Oak Ridge National Laboratory aims to tackle this grand challenge by defining and building a scalable data framework that addresses every stage of the data lifecycle to support scientists’ workflows and, by extension, accelerate scientific breakthroughs.
Modern science is driven by large-scale instruments and scientific user facilities that produce increasing amounts of highly heterogeneous data, which is then analyzed by teams of specialists from different domains. The increasing variety and volume of data, along with large teams created by the multi-facility collaborative paradigm between distinct domain specialists, creates several challenges that must be addressed to efficiently and effectively plan and conduct scientific research. Most facilities are independently grappling with such challenges in data acquisition, pre-processing, collection, management, analysis, and publication that must be addressed to create a data-rich scientific ecosystem.
SDIS has three focus areas that support researchers by building scalable data infrastructure and providing data management guidance across the entire data lifecycle: