Name: Discriminative Learning for Speech Recognition
ISBN: 978-3-031-02557-0

Overview

Authors:

Xiaodong He ⁰,
Li Deng ¹

Xiaodong He
1. Microsoft Research, USA
View author publications

You can also search for this author in PubMed Google Scholar
Li Deng
1. Microsoft Research, USA
View author publications

You can also search for this author in PubMed Google Scholar

Part of the book series: Synthesis Lectures on Speech and Audio Processing (SLSAP)

154 Accesses
8 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 29.99

Price excludes VAT (USA)

Softcover Book USD 16.99 ~~USD 37.99~~

Discount applied Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (8 chapters)

Front Matter

Pages i-vii

Download chapter PDF
Introduction and Background
- Xiaodong He, Li Deng
Pages 1-23
Statistical Speech Recognition: A Tutorial
- Xiaodong He, Li Deng
Pages 25-29
Discriminative Learning: A Unified objective Function
- Xiaodong He, Li Deng
Pages 31-45
Discriminative Learning Algorithm for Exponential-Family Distributions
- Xiaodong He, Li Deng
Pages 47-57
Discriminative Learning Algorithm for Hidden Markov Model
- Xiaodong He, Li Deng
Pages 59-74
Practical Implementation of Discriminative Learning
- Xiaodong He, Li Deng
Pages 75-89
Selected Experimental Results
- Xiaodong He, Li Deng
Pages 91-95
Epilogue
- Xiaodong He, Li Deng
Pages 97-101
Back Matter

Pages 103-112

Download chapter PDF

About this book

In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum–Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reproduce the theory in the earlier part of the book into engineering practice. Table of Contents: Introduction and Background / Statistical Speech Recognition: A Tutorial / Discriminative Learning: A Unified Objective Function / Discriminative Learning Algorithm for Exponential-Family Distributions / Discriminative Learning Algorithm for Hidden Markov Model / Practical Implementation of Discriminative Learning / Selected Experimental Results / Epilogue / Major Symbols Used in the Book and Their Descriptions / Mathematical Notation / Bibliography

Authors and Affiliations

Microsoft Research, USA

Xiaodong He, Li Deng

About the authors

Xiaodong He received his bachelor's degree from Tsinghua University, Beijing, China, in 1996, and earned his master's degree from the Chinese Academy of Sciences in 1999, and his doctoral degree from the University of Missouri-Columbia in 2003. He joined the Speech and Natural Language group of Microsoft in 2003, and the Natural Language Processing group of Microsoft Research, Redmond, WA, in 2006, where he currently serves as researcher. His research areas include statistical machine learning, automatic speech recognition, natural language processing, machine translation, signal processing, nonnative speech processing, and human-computer interaction. In these areas, he has authored/coauthored more than 30 refereed papers in leading international conferences and journals. He has filed more than 10 U.S. or international patents in the areas of speech recognition, language processing, and machine translation. He served as a reviewer for major conferences and journals in the areas of speech recognition, natural language processing, signal processing, and pattern recognition. He also served on program committees of various conferences in these areas. He is a member of ACL, IEEE, ISCA, and Sigma Xi.Li Deng received his bachelor's degree from the University of Science and Technology of China and his Ph.D. degree from the University of Wisconsin-Madison. In 1989, he joined the Department of Electrical and Computer Engineering, University of Waterloo, Ontario, Canada, as assistant professor; he became tenured full professor in 1996. From 1992 to 1993, he conducted sabbatical research at the Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, and from 1997 to 1998, at the ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan. During 1989-1999, he taught a wide range of electrical and computer engineering courses, both at undergraduate and graduate levels. In 1999, he joined Microsoft Research, Redmond, WA, as senior researcher; he currently serves as principal researcher for the same institution. He has also been affiliate professor in the Department of Electrical Engineering at University of Washington since 2000 after moving to Seattle. His past and current research areas include automatic speech and speaker recognition, statistical methods and machine learning, neural information processing, machine intelligence, audio and acoustic signal processing, statistical signal processing and digital communication, human speech production and perception, acoustic phonetics, auditory speech processing, noise robust speech processing, speech synthesis and enhancement, spoken language understanding systems, multimedia signal processing, and multimodal human-computer interaction. In these areas, he has published more than 300 refereed papersin leading international conferences and journals, and 14 book chapters, and has given keynotes, tutorials, and lectures worldwide. He has been granted more than 20 U.S. or international patents in acoustics, speech/language technology, and signal processing. He has likewise authored two recent books on speech processing.

Bibliographic Information

Book Title: Discriminative Learning for Speech Recognition
Book Subtitle: Theory and Practice
Authors: Xiaodong He, Li Deng
Series Title: Synthesis Lectures on Speech and Audio Processing
DOI: https://doi.org/10.1007/978-3-031-02557-0
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0), eBColl Synthesis Collection 2
Copyright Information: Springer Nature Switzerland AG 2008
Softcover ISBN: 978-3-031-01429-1Published: 01 August 2008
eBook ISBN: 978-3-031-02557-0Published: 01 June 2022
Series ISSN: 1932-121X
Series E-ISSN: 1932-1678
Edition Number: 1
Number of Pages: VII, 112
Topics: Electrical Engineering, Signal, Image and Speech Processing, Engineering Acoustics

Publish with us

Policies and ethics

Discriminative Learning for Speech Recognition

Overview

Access this book

Other ways to access

Table of contents (8 chapters)

Front Matter

Back Matter

About this book

Authors and Affiliations

Microsoft Research, USA

About the authors

Bibliographic Information

Publish with us

Search

Navigation