Providing storage services to production users since the beginning of 2020 the Open Storage Network provides easy access and high bandwidth sharing of active scientific data between research institutions.
Increasing amounts of scientific data emerging from projects on all scales are spurring a search among research universities for high capacity (multi-petabyte) storage systems. While the US research community and its funding agencies have made significant strategic investments in advanced computing resources and high-speed network connectivity, storage for research data remains highly balkanized and under-resourced. This calls for a new type of cyberinfrastructure geared towards facilitating the simplification of data sharing and transfer.
The Open Storage Network (OSN), is a distributed data storage service that supports sharing of active scientific data between research institutions. Hosted at the MGHPCC, OSN provides high bandwidth delivery of large data sets allowing ready access to researchers who can leverage that data to train machine learning models, validate simulations, and perform statistical analysis of live data that would otherwise require navigation of administrative barriers and low bandwidth network pathways that are too often a barrier to data sharing. Among the data currently available on the OSN are high-quality infrared bioimaging data that is being used to train machine learning models; synthetic data from ocean models; the widely used Extracted Features Set from the Hathi Trust Digital Library; open-access earth sciences data from Pangeo; Geophysical Data from BCO-DMO, and other scientifically valuable data sets.
In this talk delivered at Supercomputing 2020, John Goodhue, an OSN Co-PI, and Executive Director of the MGHPCC (one of the nodes in the OSN) describes current deployment, which supports five petabytes of usable storage distributed across the US at sites including MGHPCC, SDSC, NCSA, Johns Hopkins University and RENCI; future plans, and information about how to participate.