Menu

For Bigger Data, More Storage

November 29, 2016

Research progress is increasingly impacted by the available capacity of storage to flexibly exploit vast volumes of digital information. This is a trend across all fields of research, from astrophysics to zoology. The Northeast Storage Exchange (NESE) project, supported by the National Science Foundation, will create a next-generation storage infrastructure specifically targeted at enabling new levels of collaborative research for projects regularly involving petabytes or more of information. In a recent Harvard Gazette article NESE PIs explained why scientists need the expanded storage.
Read this story at the Harvard Gazette
by Alvin Powell, Harvard Staff Writer
As big data becomes a common analytical tool in fields from the sciences to the humanities, Harvard’s computer infrastructure experts are turning their attention to an increasingly pressing question: How do you manage it all?
In recent years, Harvard invested in the Odyssey computing cluster, whose 60,000 CPUs provide the sheer computing horsepower needed to crunch big data.
But as large data sets multiply, the question of where to put the information and how to seamlessly retrieve it for analysis has become increasingly important. In August, the National Science Foundation announced a grant of nearly $4 million over the next five years to develop the North East Storage Exchange (NESE), a collaboration among five area universities, including Harvard, to provide not just space for massive data sets, but also the high-speed infrastructure that allows it to be quickly retrieved for analysis.
“People are downloading now 50 to 80 terabyte data sets from NCBI [the National Center for Biotechnology Information] and the National Library of Medicine over an evening. This is the new normal. People [are] pulling genomic data sets wider and deeper than they’ve ever been,” said James Cuff, Harvard’s assistant dean and distinguished engineer for research computing. “What used to be — in lab, in vivo, or in vitro practice — ‘cutting edge’ … are now standard old processes. PCR [polymerase chain reaction] was cutting edge at one point. Now it’s just a thing you do.”
The institutions involved include Harvard, Massachusetts Institute of Technology, Northeastern University, Boston University, and the University of Massachusetts. They are taking on the project as an expansion of their existing high-performance computing collaboration. In 2012, the five institutions opened the Massachusetts Green High Performance Computing Center (MGHPCC). Located in Holyoke on a rehabilitated industrial site, MGHPCC provides state-of-the-art computing services and is home to part of Harvard’s Odyssey computer. The site was also designed to be energy-efficient and is largely run on hydropower and solar energy.
MGHPCC President Richard McCullough, Harvard’s vice provost for research and professor of materials science and engineering, said the capacity the project will provide is badly needed, but the project is seen as more than a one-off effort. Lessons learned will help inform similar efforts elsewhere.
“You just need more and more of these kinds of resources to be at the forefront of data science,” McCullough said. “This grant will keep us at the forefront, and may allow us to take a quantum leap forward. This is a really important win for us.”
Cuff expects data retrieval from the North East Storage Exchange to be about 10 times faster than that from equivalent storage through private cloud-based servers, and McCullough said it will be cheaper too, just a fifth that of commercial vendors.
Cuff, NESE’s principal investigator, said that officials hope to have more than 50 petabytes of storage capacity available at MGHPCC within the next five years, with the ability to expand it further. John Goodhue, MGHPCC’s executive director and a co-principal investigator of NESE, said he expects the speed of the connection to collaborating institutions to double or triple over the next few years.
“What we’re building is an extendable architecture,” Cuff said.
Though Cuff said NESE could be thought of as collaborating institutions’ private cloud, he doesn’t expect NESE to compete with commercial cloud storage providers. Rather, he said, researchers have a range of data storage options, which should be matched to their purpose. NESE, for example, could potentially back up its data to the cloud.
“This isn’t a competitor to the cloud. It’s a complementary cloud storage system,” Cuff said.
Cuff compared the NESE collaboration to the early days of the internet, when the communications needs of groups of institutions prompted them to create computer networks that grew increasingly interconnected. Now, the problem facing institutions around the country is how to manage the tidal wave of data being generated by researchers and the larger wave likely to break over them in the years to come.
The collaboration depends on contributions from each institution, Cuff said, adding that the five-year effort is also an experiment in managing their needs in order to build the research computing infrastructure of the future.
Despite all the effort, Goodhue and Cuff said, ultimately the goal is to make it invisible to the users.
“There’s cost savings at every level, savings in the amount of time a researcher has to spend worrying about whether the data is OK and backed up properly,” Goodhue said. “Having something so easy to work with that you don’t even have to think about it is a goal too.”
Story image: James Cuff, assistant dean and distinguished engineer for research computing, is the principle investigator on a $4 million NSF grant to develop the North East Storage Exchange, a collaboration of five local universities to provide easier storage and faster retrieval of massive quantities of data - Image credit: Kris Snibbe/Harvard Staff Photographer

Research projects

A Future of Unmanned Aerial Vehicles
Yale Budget Lab
Volcanic Eruptions Impact on Stratospheric Chemistry & Ozone
The Rhode Island Coastal Hazards Analysis, Modeling, and Prediction System
Towards a Whole Brain Cellular Atlas
Tornado Path Detection
The Kempner Institute - Unlocking Intelligence
The Institute for Experiential AI
Taming the Energy Appetite of AI Models
Surface Behavior
Studying Highly Efficient Biological Solar Energy Systems
Software for Unreliable Quantum Computers
Simulating Large Biomolecular Assemblies
SEQer - Sequence Evaluation in Realtime
Revolutionizing Materials Design with Computational Modeling
Remote Sensing of Earth Systems
QuEra at the MGHPCC
Quantum Computing in Renewable Energy Development
Pulling Back the Quantum Curtain on ‘Weyl Fermions’
New Insights on Binary Black Holes
NeuraChip
Network Attached FPGAs in the OCT
Monte Carlo eXtreme (MCX) - a Physically-Accurate Photon Simulator
Modeling Hydrogels and Elastomers
Modeling Breast Cancer Spread
Measuring Neutrino Mass
Investigating Mantle Flow Through Analyses of Earthquake Wave Propagation
Impact of Marine Heatwaves on Coral Diversity
IceCube: Hunting Neutrinos
Genome Forecasting
Global Consequences of Warming-Induced Arctic River Changes
Fuzzing the Linux Kernel
Exact Gravitational Lensing by Rotating Black Holes
Evolution of Viral Infectious Disease
Evaluating Health Benefits of Stricter US Air Quality Standards
Ephemeral Stream Water Contributions to US Drainage Networks
Energy Transport and Ultrafast Spectroscopy Lab
Electron Heating in Kinetic-Alfvén-Wave Turbulence
Discovering Evolution’s Master Switches
Dexterous Robotic Hands
Developing Advanced Materials for a Sustainable Energy Future
Detecting Protein Concentrations in Assays
Denser Environments Cultivate Larger Galaxies
Deciphering Alzheimer's Disease
Dancing Frog Genomes
Cyber-Physical Communication Network Security
Avoiding Smash Hits
Analyzing the Gut Microbiome
Adaptive Deep Learning Systems Towards Edge Intelligence
Accelerating Rendering Power
ACAS X: A Family of Next-Generation Collision Avoidance Systems
Neurocognition at the Wu Tsai Institute, Yale
Computational Modeling of Biological Systems
Computational Molecular Ecology
Social Capital and Economic Mobility
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040