Menu

Big-Data crunching hits the fast lane in Holyoke

June 26, 2013

Forward looking on-demand life science system, aims to boost regional industry academia collaborations around data driven biology and in life science innovation.
Read this story by Naila Moreira in the Boston Globe.

Forward looking on-demand life science system, aims to boost regional  industry academia collaborations around data driven biology and in life  science innovation.

Long missing from the biotech and high-tech map of the region, Holyoke is finally finding an advantage in its location on the western end of the Massachusetts Turnpike: It’s much faster to reach than some of the most connected places on the Internet.
Beginning this summer, life-sciences companies in the Boston area will be able to send troves of data to a new state-affiliated computing facility in Holyoke in a fraction of the time it would take to ship it to a commercial data center. Located at the new Massachusetts Green High Performance Computing Center, the life-sciences facility could lead to breakthrough drugs and other products by making it easier, faster, and even cheaper for companies to investigate leads involving large amounts of data.
That should encourage smaller operations “to try things that might fail more,” said Paul Brown, chief architect of the start-up Paradigm4, which developed a key database used in one of the nation’s largest genome projects. “Firms are going to be prepared to do things with much higher potential payoff but lower chance of success.”
The $4.5 million life-sciences computing cluster was funded by a state grant and will be installed at the Holyoke center this summer. The initiative aims to give life-sciences efforts better access to cloud computing — or the use of multiple powerful computers simultaneously to crunch data too complex for a single computer. These very large datasets, including things like genome sequences and other life-science data, are known as Big Data.
Cloud computing is available through commercial vendors such as Google Inc. and Amazon.com Inc, but using them can be both costly and too slow. Just getting Big Data to computing clusters powerful enough to analyze it can be an elaborate ordeal that can take days. The size of such files can easily overwhelm the computer networks and Internet connections of most small companies, and there are bottlenecks and traffic jams along the way that further slow down delivery.Indeed, Internet transmission can be so slow that it is quicker for companies to ship or drive their data to a computing center.
“The fastest way to get a large puddle of data from New York to LA is called the sneakernet,” Brown said. “You get a graduate student, you buy him a bus ticket, and you send it that way.”
But the life-sciences cluster in Holyoke has access to dedicated 10-gigabit-per-second fiber- optic lines that can make shipping a huge amount of data from Boston a breeze. One terabyte store of data — about 1 million average-size photos — can take a mere 15 or 20 minutes to travel from one of the universities in Greater Boston participating in the project to Holyoke; using the typical connections available to small businesses, a similar shipment could take days or weeks.
“It’s like having a new Highway 90, but for data,” said Christopher Hill of Massachusetts Institute of Technology, the principal investigator for the project. “It’s a new virtual highway from all these universities to this facility in Holyoke.”
The one catch is that the super-fast link is between the Holyoke facility and the five schools involved in the project: MIT, the University of Massachusetts, Harvard University, Northeastern University, and Boston University. Companies that want to send data to Holyoke would still have to get it to the schools first for high-speed transit.
Another feature of the project is that it should be cheaper for companies to use the Holyoke computers over a commercial cloud center, because the life-sciences cluster is funded by the state and its maintenance costs are underwritten by the five-university consortium.
“The data is local. You don’t have to move it up into some storage cloud, buy the storage cloud from a provider, like Google or Amazon, then compute it. This is in our backyard. It’s easy to consume, and it’s cost-effective,” said chief health care strategist David Dimond of EMC Corp., the Hopkinton-based computer storage giant.
EMC is one of a number of heavyweight business partners backing the venture; others include IBM Corp., AstraZeneca PLC, Pfizer Inc., Merck & Co., and Merrimack Pharmaceuticals Inc. Officials from those companies are helping in the design of the center and will probably collaborate with experts from the universities to use the center’s resources. Through such collaborations, industry researchers will be able to “peek inside” the computers as they churn through problems, said Prashant Shenoy of the University of Massachusetts, an investigator on the project.
“Because we own the machines, we can get a much deeper idea of how researchers are using the machines than if they used the commercial cloud,” he said.
One enterprise that expects to use the Holyoke computer cluster is hack/reduce, a Cambridge nonprofit that helps companies with Big Data challenges and hosts hack-a-thons at its Kendall Square offices. Hack/reduce expects to use the Holyoke cluster to run frequent public computing challenges on life-sciences projects.
Its founder, Chris Lynch, said those sessions will help the Holyoke center build a bridge to the entrepreneurial community in Boston and Cambridge.
“It’s really providing a platform for engagement,” Lynch said.
The end goal of the life-sciences cluster, said the director of the Holyoke center, John Goodhue, is to promote collaborative research between companies and academics in genomics and related fields, with an eventual aim of developing diagnoses and treatments that are specific or unique to each patient or illness. For example, being able to distinguish the genetic fingerprint of one cancer tumor from another may allow researchers to develop singular treatments for each patient.
The Holyoke cluster is also expected to help small and big companies alike tackle problems not well suited to the commercial cloud, said John Reynders, the head of AstraZeneca’s informatics research and development.
Some problems can easily be split into parts that run on separate computers. Other challenges can’t be split easily, because the sets of computers performing different tasks need to constantly interact with one another as they puzzle through the data. That’s something the Holyoke facility can do, but a commercial cloud operator cannot, Reynders said.
For example, scientists might be searching for a common biomarker among Alzheimer’s patients who have responded well to a single medication, which requires sophisticated analysis across multiple data sets.
“You have to bring together imaging data, genetic data, clinical data, proteomic data,” Reynders said. “That’s the kind of puzzle we see the platform . . . being able to help us crack.”
AstraZeneca is among the big companies that expect to take advantage of their proximity to the Holyoke facility for experiments with Big Data.
“Certainly having an environment like Holyoke —that’s going to allow us to tackle very challenging computational and data-centric problems in life sciences — that’s something very exciting for us,” said Reynders.

Research projects

A Future of Unmanned Aerial Vehicles
Yale Budget Lab
Volcanic Eruptions Impact on Stratospheric Chemistry & Ozone
The Rhode Island Coastal Hazards Analysis, Modeling, and Prediction System
Towards a Whole Brain Cellular Atlas
Tornado Path Detection
The Kempner Institute – Unlocking Intelligence
The Institute for Experiential AI
Taming the Energy Appetite of AI Models
Surface Behavior
Studying Highly Efficient Biological Solar Energy Systems
Software for Unreliable Quantum Computers
Simulating Large Biomolecular Assemblies
SEQer – Sequence Evaluation in Realtime
Revolutionizing Materials Design with Computational Modeling
Remote Sensing of Earth Systems
QuEra at the MGHPCC
Quantum Computing in Renewable Energy Development
Pulling Back the Quantum Curtain on ‘Weyl Fermions’
New Insights on Binary Black Holes
NeuraChip
Network Attached FPGAs in the OCT
Monte Carlo eXtreme (MCX) – a Physically-Accurate Photon Simulator
Modeling Hydrogels and Elastomers
Modeling Breast Cancer Spread
Measuring Neutrino Mass
Investigating Mantle Flow Through Analyses of Earthquake Wave Propagation
Impact of Marine Heatwaves on Coral Diversity
IceCube: Hunting Neutrinos
Genome Forecasting
Global Consequences of Warming-Induced Arctic River Changes
Fuzzing the Linux Kernel
Exact Gravitational Lensing by Rotating Black Holes
Evolution of Viral Infectious Disease
Evaluating Health Benefits of Stricter US Air Quality Standards
Ephemeral Stream Water Contributions to US Drainage Networks
Energy Transport and Ultrafast Spectroscopy Lab
Electron Heating in Kinetic-Alfvén-Wave Turbulence
Discovering Evolution’s Master Switches
Dexterous Robotic Hands
Developing Advanced Materials for a Sustainable Energy Future
Detecting Protein Concentrations in Assays
Denser Environments Cultivate Larger Galaxies
Deciphering Alzheimer’s Disease
Dancing Frog Genomes
Cyber-Physical Communication Network Security
Avoiding Smash Hits
Analyzing the Gut Microbiome
Adaptive Deep Learning Systems Towards Edge Intelligence
Accelerating Rendering Power
ACAS X: A Family of Next-Generation Collision Avoidance Systems
Neurocognition at the Wu Tsai Institute, Yale
Computational Modeling of Biological Systems
Computational Molecular Ecology
Social Capital and Economic Mobility
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040