The TELLAR Model for Speech Recognition
Research Mentor(s)
Hutchinson, Brian
Description
The "TEnsor Log-LineAR" (TELLAR) model probabilistically maps acoustics to speech sounds (phones), a key step in speech recognition systems. This model uses a low n-rank tensor to perform this mapping, and in doing so, finds linear transforms of acoustics and phones into low dimensional spaces. By embedding phones into a low dimensional space, the model is capable of pooling information about related speech sounds, and is able to make better predictions with less data. It also aids interpretability: similar phones will be clustered near to each other in this space. Training the model involves solving a non-smooth convex optimization problem, for which we have an efficient algorithm and the guarantee of finding a globally optimal solution. Initial results in phone classification are promising, but this work is on-going. Next, we plan to incorporate the TELLAR model into state of the art speech recognition systems to improve their performance.
Document Type
Event
Start Date
14-5-2015 10:00 AM
End Date
14-5-2015 2:00 PM
Department
Computer Science
Genre/Form
student projects; posters
Subjects – Topical (LCSH)
Speech processing systems; Sound--Recording and reproducing--Digital techniques
Type
Image
Rights
Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this documentation for commercial purposes, or for financial gain, shall not be allowed without the author's written permission.
Language
English
Format
application/pdf
The TELLAR Model for Speech Recognition
The "TEnsor Log-LineAR" (TELLAR) model probabilistically maps acoustics to speech sounds (phones), a key step in speech recognition systems. This model uses a low n-rank tensor to perform this mapping, and in doing so, finds linear transforms of acoustics and phones into low dimensional spaces. By embedding phones into a low dimensional space, the model is capable of pooling information about related speech sounds, and is able to make better predictions with less data. It also aids interpretability: similar phones will be clustered near to each other in this space. Training the model involves solving a non-smooth convex optimization problem, for which we have an efficient algorithm and the guarantee of finding a globally optimal solution. Initial results in phone classification are promising, but this work is on-going. Next, we plan to incorporate the TELLAR model into state of the art speech recognition systems to improve their performance.