Name: Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis
ISBN: 978-3-662-45258-5

Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis

book PDF

book EPUB

Overview

Editors:

Keikichi Hirose⁰,
Jianhua Tao¹

Keikichi Hirose
1. University of Tokyo, Tokyo, Japan
View editor publications

You can also search for this editor in PubMed Google Scholar
Jianhua Tao
1. Institute of Automation, Chinese Academy of Sciences, Beijing, China
View editor publications

You can also search for this editor in PubMed Google Scholar

Selects recent works on speech prosody written by world-wide eminent scholars; a “must read” book of the speech “prosodist”
Gives clear and total view on how prosody conveying linguistic and para-/non- linguistic information
Offers guidelines toward an ultimate goal of speech synthesis; realizing human-like quality and flexibility in speech synthesis
Includes supplementary material: sn.pub/extras

Part of the book series: Prosody, Phonology and Phonetics (PRPHPH)

12k Accesses
17 Citations

Buy print copy

Softcover Book USD 119.99

Price excludes VAT (USA)

Hardcover Book USD 119.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Table of contents (14 chapters)

Front Matter

Pages i-viii

Download chapter PDF
Modeling of Prosody
1. Front Matter
  
  Pages 1-1
  
  Download chapter PDF
2. ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis
  
  Daniel J. Hirst
  
  Pages 3-17
  
  Download chapter PDF
3. Degrees of Freedom in Prosody Modeling
  
  Yi Xu, Santitham Prom-on
  
  Pages 19-34
  
  Download chapter PDF
4. Extraction, Analysis and Synthesis of Fujisaki model Parameters
  
  Hansjörg Mixdorff
  
  Pages 35-47
  
  Download chapter PDF
5. Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion
  
  Hirokazu Kameoka
  
  Pages 49-69
  
  Download chapter PDF
Para- and Non-Linguistic Issues of Prosody
1. Front Matter
  
  Pages 71-71
  
  Download chapter PDF
2. Communicative Speech Synthesis as Pan-Linguistic Prosody Control
  
  Yoshinori Sagisaka, Yoko Greenberg
  
  Pages 73-82
  
  Download chapter PDF
3. Mandarin Stress Analysis and Prediction for Speech Synthesis
  
  Ya Li, Jianhua Tao
  
  Pages 83-95
  
  Download chapter PDF
4. Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems
  
  Nick Campbell, Ya Li
  
  Pages 97-107
  
  Download chapter PDF
5. Temporally Variable Multi attribute Morphing of Arbitrarily Many Voices for Exploratory Research of Speech Prosody
  
  Hideki Kawahara
  
  Pages 109-120
  
  Download chapter PDF
Control of Prosody in Speech Synthesis
1. Front Matter
  
  Pages 121-121
  
  Download chapter PDF
2. Statistical Models for Dealing with Discontinuity of Fundamental Frequency
  
  Kai Yu
  
  Pages 123-144
  
  Download chapter PDF
3. Use of Generation Process Model for Improved Control of Fundamental Frequency Contours in HMM-Based Speech Synthesis
  
  Keikichi Hirose
  
  Pages 145-159
  
  Download chapter PDF
4. Tone Nucleus Model for Emotional Mandarin Speech Synthesis
  
  Miaomiao Wang
  
  Pages 161-171
  
  Download chapter PDF
5. Emphasis, Word Prominence, and Continuous Wavelet Transform in the Control of HMM-Based Synthesis
  
  Martti Vainio, Antti Suni, Daniel Aalto
  
  Pages 173-188
  
  Download chapter PDF
6. Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human
  
  Nicolas Obin, Christophe Veaux, Pierre Lanchantin
  
  Pages 189-202
  
  Download chapter PDF
7. Prosody Control and Variation Enhancement Techniques for HMM-Based Expressive Speech Synthesis
  
  Takao Kobayashi
  
  Pages 203-213
  
  Download chapter PDF

Keywords

About this book

The volume addresses issues concerning prosody generation in speech synthesis, including prosody modeling, how we can convey para- and non-linguistic information in speech synthesis, and prosody control in speech synthesis (including prosody conversions). A high level of quality has already been achieved in speech synthesis by using selection-based methods with segments of human speech. Although the method enables synthetic speech with various voice qualities and speaking styles, it requires large speech corpora with targeted quality and style.

Accordingly, speech conversion techniques are now of growing interest among researchers. HMM/GMM-based methods are widely used, but entail several major problems when viewed from the prosody perspective; prosodic features cover a wider time span than segmental features and their frame-by-frame processing is not always appropriate. The book offers a good overview of state-of-the-art studies on prosody in speech synthesis.

Editors and Affiliations

University of Tokyo, Tokyo, Japan

Keikichi Hirose
Institute of Automation, Chinese Academy of Sciences, Beijing, China

Jianhua Tao

About the editors

Professor Keikichi Hirose received the B. E. degree in electrical engineering in 1972, and the M. E. and Ph. D. degrees in electronic engineering respectively in 1974 and 1977 from the University of Tokyo. From 1977, he is a faculty member at the University of Tokyo, and was a Professor of the Department of Electronic Engineering from 1994. Currently he is professor at the Department of Information and Communication Engineering, Graduate School of Information Science and Technology, University of Tokyo. From March 1987 to January 1988, he was Visiting Scientist at the Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, U.S.A. He has been engaged in a wide range of research on spoken language processing, including analysis, synthesis, recognition, dialogue systems, and computer-assisted language learning. From 2000 to 2004, he was Principal Investigator of the national project “Realization of advanced spoken language information processing utilizing prosodic features,” supported by the Japanese Government. He served as Chair of Speech Committee, Institute of Electronics, Information and Communication Engineers (IEICE)/Acoustical Society of Japan (ASJ) from 2003 to 2005. He is Chair of Speech Prosody Special Interest Group (SPro-SIG), ISCA, from October 2010. He has been on the editorial board of Speech Communication journal since 2004 and on the editorial board of ETRI Journal since 2009. He is a Fellow of Institute of Information and Communication Engineering and a member of a number of academic societies, including IEEE, International Speech Communication Association (Board member), Acoustical Society of America, Acoustical Society of Japan, Information Processing Society of Japan, Japanese Society for Artificial Intelligence, and Research Institute of Signal Processing Japan (Board member).

Jianhua Tao received the M.S. degree from Nanjing University in 1996 and the Ph.D. in Computer Science from TsinghuaUniversity in 2001. He is currently the professor at National Laboratory of Pattern Recognition (NLPR) of Chinese Academy of Sciences where he chairs the human computer speech interaction group. He developed quite several earliest versions of Speech systems, multimodal interaction system in China, and published more than 90 papers in IEEE Trans. on ASLP, ICASSP, Interspeech, ICME, ICPR, ICCV, ICIP, etc. He has been the main researcher and contributor of several national scientific projects supported by National Natural Science Foundation of China (NSFC), National High-Tech Program and International Cooperation Projects (863). Currently, He is one of the editorial board members of "International Journal on Computational Linguistics and Chinese Language Processing", “Journal on Multimodal User Interfaces (JMUI)”, “International Journal of Synthetic Emotions (IJSE)”, and the Steering Committee Member for the IEEE Transactions on Affective Computing. He was elected as vice-chair of ISCA Special Interesting Group of Chinese Spoken Language Processing from 2006, the executive committee member of HUMAINE association from 2007, the board member of COCOSDA from 2007, and is also the Council member of Chinese Speech Information Processing Society and the Acoustical Society of China.

Bibliographic Information

Book Title: Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis
Editors: Keikichi Hirose, Jianhua Tao
Series Title: Prosody, Phonology and Phonetics
DOI: https://doi.org/10.1007/978-3-662-45258-5
Publisher: Springer Berlin, Heidelberg
eBook Packages: Humanities, Social Sciences and Law, Social Sciences (R0)
Copyright Information: Springer-Verlag Berlin Heidelberg 2015
Hardcover ISBN: 978-3-662-45257-8Published: 23 March 2015
Softcover ISBN: 978-3-662-52501-2Published: 13 October 2016
eBook ISBN: 978-3-662-45258-5Published: 25 February 2015
Series ISSN: 2197-8700
Series E-ISSN: 2197-8719
Edition Number: 1
Number of Pages: VIII, 213
Number of Illustrations: 60 illustrations in colour
Topics: Phonology and Phonetics, Syntax, Signal, Image and Speech Processing, Communication Studies

Publish with us

Policies and ethics

Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis

Overview

Buy print copy

Table of contents (14 chapters)

Front Matter

Modeling of Prosody

Front Matter

Para- and Non-Linguistic Issues of Prosody

Front Matter

Control of Prosody in Speech Synthesis

Front Matter

Keywords

About this book

Editors and Affiliations

University of Tokyo, Tokyo, Japan

Institute of Automation, Chinese Academy of Sciences, Beijing, China

About the editors

Bibliographic Information

Publish with us

Search

Navigation