Name: Neural Text-to-Speech Synthesis
ISBN: 978-981-99-0827-1

Overview

Authors:

Xu Tan ⁰

Xu Tan
1. Microsoft Research Asia (China), Beijing, China
View author publications

You can also search for this author in PubMed Google Scholar

The first book to comprehensively introduce neural text-to-speech synthesis
Illustrates the complete process of text-to-speech synthesis technology
Equip readers to implement text-to-speech synthesis, either for research or product

Part of the book series: Artificial Intelligence: Foundations, Theory, and Algorithms (AIFTA)

9032 Accesses
2 Citations
4 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 129.00

Price excludes VAT (USA)

Hardcover Book USD 169.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (13 chapters)

Front Matter

Pages i-xxv

Download chapter PDF
Introduction
- Xu Tan
Pages 1-14
Preliminary
1. Front Matter
  
  Pages 15-15
  
  Download chapter PDF
2. Basics of Spoken Language Processing
  
  Xu Tan
  
  Pages 17-36
3. Basics of Deep Learning
  
  Xu Tan
  
  Pages 37-61
Key Components in TTS
1. Front Matter
  
  Pages 63-65
  
  Download chapter PDF
2. Text Analyses
  
  Xu Tan
  
  Pages 67-80
3. Acoustic Models
  
  Xu Tan
  
  Pages 81-100
4. Vocoders
  
  Xu Tan
  
  Pages 101-114
5. Fully End-to-End TTS
  
  Xu Tan
  
  Pages 115-122
Advanced Topics in TTS
1. Front Matter
  
  Pages 123-124
  
  Download chapter PDF
2. Expressive and Controllable TTS
  
  Xu Tan
  
  Pages 125-140
3. Robust TTS
  
  Xu Tan
  
  Pages 141-151
4. Model-Efficient TTS
  
  Xu Tan
  
  Pages 153-161
5. Data-Efficient TTS
  
  Xu Tan
  
  Pages 163-173
6. Beyond Text-to-Speech Synthesis
  
  Xu Tan
  
  Pages 175-179
Summary and Outlook
1. Front Matter
  
  Pages 181-181
  
  Download chapter PDF
2. Summary and Outlook
  
  Xu Tan
  
  Pages 183-185
Back Matter

Pages 187-201

Download chapter PDF

Keywords

About this book

Text-to-speech (TTS) aims to synthesize intelligible and natural speech based on the given text. It is a hot topic in language, speech, and machine learning research and has broad applications in industry. This book introduces neural network-based TTS in the era of deep learning, aiming to provide a good understanding of neural TTS, current research and applications, and the future research trend.

This book first introduces the history of TTS technologies and overviews neural TTS, and provides preliminary knowledge on language and speech processing, neural networks and deep learning, and deep generative models. It then introduces neural TTS from the perspective of key components (text analyses, acoustic models, vocoders, and end-to-end models) and advanced topics (expressive and controllable, robust, model-efficient, and data-efficient TTS). It also points some future research directions and collects some resources related to TTS.

This book is the first to introduceneural TTS in a comprehensive and easy-to-understand way and can serve both academic researchers and industry practitioners working on TTS.

Authors and Affiliations

Microsoft Research Asia (China), Beijing, China

Xu Tan

About the author

Xu Tan is a Principal Researcher and Research Manager at Microsoft Research Asia. His research interests cover deep learning and its applications in language/speech/music processing and digital human creation. He has rich research experience in text-to-speech synthesis. He has developed high-quality TTS systems such as FastSpeech 1/2 (widely used in the TTS community), DelightfulTTS (winning the champion of the Blizzard TTS Challenge), and NaturalSpeech (achieving human-level quality on the TTS benchmark dataset), and transferred many research works to improve the experience of Microsoft Azure TTS services. He has given a series of tutorials on TTS at top conferences such as IJCAI, ICASSP, and INTERSPEECH, and written a comprehensive survey paper on TTS.

Besides speech synthesis, he has designed several popular language models (e.g., MASS) and AI music systems (e.g., Muzic), developed machine translation systems that achieved human parity in Chinese-English translation and won several champions in WMT machine translation competitions. He has published over 100 papers at prestigious conferences such as ICML, NeurIPS, ICLR, AAAI, IJCAI, ACL, EMNLP, NAACL, ICASSP, INTERSPEECH, KDD, and IEEE/ACM Transactions, and served as the area chair or action editor of some AI conferences and journals (e.g., NeurIPS, AAAI, ICASSP, TMLR).

Bibliographic Information

Book Title: Neural Text-to-Speech Synthesis
Authors: Xu Tan
Series Title: Artificial Intelligence: Foundations, Theory, and Algorithms
DOI: https://doi.org/10.1007/978-981-99-0827-1
Publisher: Springer Singapore
eBook Packages: Computer Science, Computer Science (R0)
Copyright Information: The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
Hardcover ISBN: 978-981-99-0826-4Published: 30 May 2023
Softcover ISBN: 978-981-99-0829-5Due: 01 August 2023
eBook ISBN: 978-981-99-0827-1Published: 29 May 2023
Series ISSN: 2365-3051
Series E-ISSN: 2365-306X
Edition Number: 1
Number of Pages: XXV, 201
Number of Illustrations: 24 illustrations in colour
Topics: Natural Language Processing (NLP), Signal, Image and Speech Processing, Machine Learning, Artificial Intelligence

Publish with us

Policies and ethics

Neural Text-to-Speech Synthesis

Overview

Access this book

Other ways to access

Table of contents (13 chapters)

Front Matter

Preliminary

Front Matter

Key Components in TTS

Front Matter

Advanced Topics in TTS

Front Matter

Summary and Outlook

Front Matter

Back Matter

Keywords

About this book

Authors and Affiliations

Microsoft Research Asia (China), Beijing, China

About the author

Bibliographic Information

Publish with us

Search

Navigation