The TELLAR Model for Speech Recognition

Research Mentor(s)

Hutchinson, Brian

Description

The "TEnsor Log-LineAR" (TELLAR) model probabilistically maps acoustics to speech sounds (phones), a key step in speech recognition systems. This model uses a low n-rank tensor to perform this mapping, and in doing so, finds linear transforms of acoustics and phones into low dimensional spaces. By embedding phones into a low dimensional space, the model is capable of pooling information about related speech sounds, and is able to make better predictions with less data. It also aids interpretability: similar phones will be clustered near to each other in this space. Training the model involves solving a non-smooth convex optimization problem, for which we have an efficient algorithm and the guarantee of finding a globally optimal solution. Initial results in phone classification are promising, but this work is on-going. Next, we plan to incorporate the TELLAR model into state of the art speech recognition systems to improve their performance.

Document Type

Event

Start Date

14-5-2015 10:00 AM

End Date

14-5-2015 2:00 PM

Department

Computer Science

Genre/Form

student projects; posters

Subjects – Topical (LCSH)

Speech processing systems; Sound--Recording and reproducing--Digital techniques

Type

Image

Rights

Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this documentation for commercial purposes, or for financial gain, shall not be allowed without the author's written permission.

Language

English

Format

application/pdf

This document is currently not available here.

Share

COinS
 
May 14th, 10:00 AM May 14th, 2:00 PM

The TELLAR Model for Speech Recognition

The "TEnsor Log-LineAR" (TELLAR) model probabilistically maps acoustics to speech sounds (phones), a key step in speech recognition systems. This model uses a low n-rank tensor to perform this mapping, and in doing so, finds linear transforms of acoustics and phones into low dimensional spaces. By embedding phones into a low dimensional space, the model is capable of pooling information about related speech sounds, and is able to make better predictions with less data. It also aids interpretability: similar phones will be clustered near to each other in this space. Training the model involves solving a non-smooth convex optimization problem, for which we have an efficient algorithm and the guarantee of finding a globally optimal solution. Initial results in phone classification are promising, but this work is on-going. Next, we plan to incorporate the TELLAR model into state of the art speech recognition systems to improve their performance.