Menu

The Trickiness of Talking to Computers

July 13, 2017

by Helen Hill for MGHPCC
James Glass is a senior research scientist at the Massachusetts Institute of Technology. Glass leads the Spoken Language Systems Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL.) His research is focused on automatic speech recognition, unsupervised speech processing, and spoken language understanding. This past spring, assisted by graduate student David Harwath, Glass was the instructor for MIT's 6.345/HST.728 Automatic Speech Recognition class but this year, for the first time, students had the option of using high performance computing resources at the MGHPCC to facilitate their work.
6.345/ HST.728 is a graduate level course aimed at introducing students to the rapidly evolving field of speech recognition and spoken language processing. While the first half of the class covers concepts and computational techniques through traditional lectures, labs and problem sets, where accompanying computation is readily accommodated on MIT's public Athena computing network, the second half of the class comprises a typically much more computationally demanding final term project.
As part of the course curriculum, the students chose a current research topic to explore. Some students chose to write programs capable of automatically recognizing the language that a person was speaking, while other students created systems that were able to infer the emotional state or personality traits of a speaker. Because most of the projects relied on data-hungry and computationally intensive statistical machine learning algorithms, the MGHPCC was key in enabling the students to complete their term projects.
"Having access to the MGHPCC allowed the students to use more sophisticated models involving more complicated elements. Many of the projects draw on machine learning techniques reliant on leveraging large quantities of data to train the models. In the past we let students use our group's facilities but having recently redesigned the course to accommodate a curriculum shift more towards deep neural network models we realized, going forward, students really needed something bigger," says Glass.
Fortunately a timely comment from one of his students mentioning her great experience using MGHPCC with a different project led Glass to contact Christopher Hill the Director of MIT's Research Computing Project who's team then worked with Harwath to provide class members access to the resources they needed.
"Some of the students in the class had access to their own computing resources, but for those who didn't the availability of a facility internal to MIT, with lots of pre-installed libraries they could leverage was terrific. Of course we had other options. For example, some other classes have used Amazon Cloud, but for our purposes this seemed like a much more natural set-up and one we are eager to repeat."
"Siri. Alexa. Voice recognition software has reached a tipping point." Glass tells me. "Nonetheless there is still plenty more room for improvement."
"The ability to speak and use language is a critical skill for machines to master," he says, "and it's a very hard problem because speech is a signal that gets contaminated by noise. The physics of everybody’s vocal tract is different, your linguistic background, your dialect. The sound of your voice changes with the situation you are in, your emotional state, whether you are inside or outside: The speech signal when you say the exact same thing, its never ever the same. Interpreting context, deconstructing dialogue: Giving the students serious HPC tools to work with takes what we can teach them to a new level."
About the Researchers

James Glass is a senior research scientist at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) where he heads the Spoken Language Systems Group. He is also a lecturer in the Harvard-MIT Division of Health Sciences and Technology. His primary research interests are in the area of speech communication and human-computer interaction centered on automatic speech recognition and spoken language understanding.
David Harwath is a graduate student in Glass's Group doing research combining speech and visual perception based on collected data of people talking about pictures.

Links

James Glass
Spoken Language Systems Group, MIT

Research projects

A Future of Unmanned Aerial Vehicles
Yale Budget Lab
Volcanic Eruptions Impact on Stratospheric Chemistry & Ozone
The Rhode Island Coastal Hazards Analysis, Modeling, and Prediction System
Towards a Whole Brain Cellular Atlas
Tornado Path Detection
The Kempner Institute - Unlocking Intelligence
The Institute for Experiential AI
Taming the Energy Appetite of AI Models
Surface Behavior
Studying Highly Efficient Biological Solar Energy Systems
Software for Unreliable Quantum Computers
Simulating Large Biomolecular Assemblies
SEQer - Sequence Evaluation in Realtime
Revolutionizing Materials Design with Computational Modeling
Remote Sensing of Earth Systems
QuEra at the MGHPCC
Quantum Computing in Renewable Energy Development
Pulling Back the Quantum Curtain on ‘Weyl Fermions’
New Insights on Binary Black Holes
NeuraChip
Network Attached FPGAs in the OCT
Monte Carlo eXtreme (MCX) - a Physically-Accurate Photon Simulator
Modeling Hydrogels and Elastomers
Modeling Breast Cancer Spread
Measuring Neutrino Mass
Investigating Mantle Flow Through Analyses of Earthquake Wave Propagation
Impact of Marine Heatwaves on Coral Diversity
IceCube: Hunting Neutrinos
Genome Forecasting
Global Consequences of Warming-Induced Arctic River Changes
Fuzzing the Linux Kernel
Exact Gravitational Lensing by Rotating Black Holes
Evolution of Viral Infectious Disease
Evaluating Health Benefits of Stricter US Air Quality Standards
Ephemeral Stream Water Contributions to US Drainage Networks
Energy Transport and Ultrafast Spectroscopy Lab
Electron Heating in Kinetic-Alfvén-Wave Turbulence
Discovering Evolution’s Master Switches
Dexterous Robotic Hands
Developing Advanced Materials for a Sustainable Energy Future
Detecting Protein Concentrations in Assays
Denser Environments Cultivate Larger Galaxies
Deciphering Alzheimer's Disease
Dancing Frog Genomes
Cyber-Physical Communication Network Security
Avoiding Smash Hits
Analyzing the Gut Microbiome
Adaptive Deep Learning Systems Towards Edge Intelligence
Accelerating Rendering Power
ACAS X: A Family of Next-Generation Collision Avoidance Systems
Neurocognition at the Wu Tsai Institute, Yale
Computational Modeling of Biological Systems
Computational Molecular Ecology
Social Capital and Economic Mobility
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040