Menu

Data Sciences for the Life Sciences

April 1, 2014

From ResearchNext the Research Digest of UMass Amherst
Statistical, life science, and social science researchers gathered for a workshop on “Data Sciences for the Life Sciences in a High Performance Computing Environment” in February: the first formal opportunity for the researchers to learn how to effectively utilize the MGHPCC facility.


A group of more than 40 statistical, life science, and social science researchers braved the cold and dark of an early February morning to be part of two firsts in regional high performance computing efforts in the life sciences. Their participation in the workshop “Data Sciences for the Life Sciences in a High Performance Computing Environment” was the first formal opportunity for life sciences researchers to learn how to effectively utilize the Holyoke Massachusetts Green High Performance Computing Center (MGHPCC) facility for their research. The workshop was also the first offering of the new Biostatistics in Practice series, jointly sponsored by UMass Amherst and the MGHPCC.
ICB3-088-edit-1-1024x682These firsts reflect the increasing role of statistical, mathematical and computational research methods for life science research as well as the increasing need for researchers to work with large-scale data from diverse sources. Campus sponsors of the series are the Graduate Program in Biostatistics and the UMass Institute for Computational Biology, Biostatistics, and Bioinformatics (ICB3).
ICB3-071-1-1024x682Workshop participants included faculty and graduate students from UMass Amherst as well as other colleges, universities, and healthcare organizations, all curious to access the MGHPCC in order to learn state-of-the-art tools from a teaching team drawn from UMass Amherst Biostatistics faculty. Assistant professor Nicholas Reich, workshop director, ICB3 director and head of biostatistics Andrea Foulkes, and lecturer Gregory Matthews each delivered modules that together provided a foundational curriculum on statistical computing using R, an open-source and freely-available statistical programming language, in a high performance computing environment while providing practical experience in using it on the MGHPCC platform. In addition to the instructors, teaching assistants provided a high-level of individual support for workshop participants.
R is rapidly being adopted as the programming language of choice for researchers at the intersection of life and statistical sciences. As an open-source language it is affordable to researchers regardless of budget and facilitates sharing of the code for new techniques and tools. This flexibility seems to be especially valuable to researchers in biostatistics, bioinformatics, and computational biology, who are being challenged to invent new approaches and methods to cope with the analytical power to draw insights from “big data,” which can be too voluminous or complex to be understood using conventional methods alone.
On a practical level, high-performance computing is another key to drawing insight from data that is measured in terabytes or petabytes and can far exceed both the computing and storage capacities of even the most powerful desktop computers. MGHPCC director John Goodhue explains, “Moving and processing big data with conventional methods is a bit like asking a single person to move and read every book in the Library of Congress with a hand truck and reading lamp. The MGHPCC is equipped for high bandwidth communications to allow data to be moved in or out expeditiously and to connect many computing “cores” so they can work “in parallel” to handle large and/or complex datasets, thus making it possible for researchers to run programs that analyze the data in a reasonable period of time—hours or days instead of weeks.” Participants were able to tour the MGHPCC facility to gain a better appreciation of the thoughtful design as an environmentally responsible high-performance computing resource as well as its research capacity.
ICB3-050-edit-1-1024x765Speaking about the changing role of statistical and computational methodologies in life science research Andrea Foulkes notes, “Biomedical researchers are able to generate large quantities of data providing in-depth coverage within and across individuals. The MGHPCC offers state-of-the-art computational resources for data management and processing which, coupled with powerful R tools, enable researchers to turn data into knowledge.”
Looking forward to future offerings, Reich sees many opportunities. “We were pleased that the workshop was fully subscribed well in advance of the meeting. Given the level of interest we will consider holding it again soon, perhaps as early as this summer. We have other topics in mind as well, and with the start-up of the UMass Institute for Applied Life Sciences we expect the list to grow.”
All images courtesy: School of Public Health and Health Sciences, UMass, Amherst.
Story by Karen Lauter-Utgoff

Research projects

A Future of Unmanned Aerial Vehicles
Yale Budget Lab
Volcanic Eruptions Impact on Stratospheric Chemistry & Ozone
The Rhode Island Coastal Hazards Analysis, Modeling, and Prediction System
Towards a Whole Brain Cellular Atlas
Tornado Path Detection
The Kempner Institute – Unlocking Intelligence
The Institute for Experiential AI
Taming the Energy Appetite of AI Models
Surface Behavior
Studying Highly Efficient Biological Solar Energy Systems
Software for Unreliable Quantum Computers
Simulating Large Biomolecular Assemblies
SEQer – Sequence Evaluation in Realtime
Revolutionizing Materials Design with Computational Modeling
Remote Sensing of Earth Systems
QuEra at the MGHPCC
Quantum Computing in Renewable Energy Development
Pulling Back the Quantum Curtain on ‘Weyl Fermions’
New Insights on Binary Black Holes
NeuraChip
Network Attached FPGAs in the OCT
Monte Carlo eXtreme (MCX) – a Physically-Accurate Photon Simulator
Modeling Hydrogels and Elastomers
Modeling Breast Cancer Spread
Measuring Neutrino Mass
Investigating Mantle Flow Through Analyses of Earthquake Wave Propagation
Impact of Marine Heatwaves on Coral Diversity
IceCube: Hunting Neutrinos
Genome Forecasting
Global Consequences of Warming-Induced Arctic River Changes
Fuzzing the Linux Kernel
Exact Gravitational Lensing by Rotating Black Holes
Evolution of Viral Infectious Disease
Evaluating Health Benefits of Stricter US Air Quality Standards
Ephemeral Stream Water Contributions to US Drainage Networks
Energy Transport and Ultrafast Spectroscopy Lab
Electron Heating in Kinetic-Alfvén-Wave Turbulence
Discovering Evolution’s Master Switches
Dexterous Robotic Hands
Developing Advanced Materials for a Sustainable Energy Future
Detecting Protein Concentrations in Assays
Denser Environments Cultivate Larger Galaxies
Deciphering Alzheimer’s Disease
Dancing Frog Genomes
Cyber-Physical Communication Network Security
Avoiding Smash Hits
Analyzing the Gut Microbiome
Adaptive Deep Learning Systems Towards Edge Intelligence
Accelerating Rendering Power
ACAS X: A Family of Next-Generation Collision Avoidance Systems
Neurocognition at the Wu Tsai Institute, Yale
Computational Modeling of Biological Systems
Computational Molecular Ecology
Social Capital and Economic Mobility
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040