• Data reduction

This course will be taught in English.

Short Program

New technologies allow data to be generated at unprecedented rates and scales in fields ranging from high energy physics and nuclear physics to radio astronomy and photon sciences. The resulting data streams are scientifically rich; however, it is becoming increasingly impractical to store them in their entirety. And, while great strides have been made in data reduction in fields such as digital media, scientific data presents unique challenges. Furthermore, one of our greatest challenges is to process and annotate all the data being produced by large scientific instruments. We have fallen into a new age where we can generate data at unprecedented rates, and although it is possible to store a large amount of this data, it is impossible to fully analyze. To fully realize the power and capability of these new scientific facilities, new investments must appear to reduce the size, storage footprint, and time spent to query and analyze this data. Without such investments, ad-hoc decisions will be made to remove data, which will inevitably lead to the loss of important information.

New methods for reducing streaming and voluminous data sets while maintaining accurate representations of scientifically relevant derived quantities of interest (QoIs) are of critical and growing importance for today’s science. Such methods are of particular importance for the dozens of scientific facilities, with aggregate costs in the billions, operated by large facilities around the world and many other science agencies. Many of these facilities face critical decisions concerning the allocation of resources for moving, processing, and storing data. R&D is urgently needed, both to develop improved data reduction methods and to improve understanding of how these methods can be used effectively within facilities and in scientific campaigns. In this tutorial we will first cover the state of the art in lossy compression of scientific datasets, and discuss detail lossy compressors (MGARD, SZ, ZFP) and the errors of both the primary data and errors in Quantities of Interests (QoIs). We will then go into detail to discuss the MGARD compression method. Finally, we will turn on these compression methods with ADIOS and then visualize the data to understand the effect of lossy compression.

teachers

Scott Klasky

Scott Klasky

Data reduction