Editors:

Tuomas Virtanen
1. Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland
View editor publications

You can also search for this editor in PubMed Google Scholar
Mark D. Plumbley
1. Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey, United Kingdom
View editor publications

You can also search for this editor in PubMed Google Scholar
Dan Ellis
1. Google Inc., New York, USA
View editor publications

You can also search for this editor in PubMed Google Scholar

Gives an overview of methods for computational analysis of sounds scenes and events, allowing those new to the field to become fully informed
Covers all the aspects of the machine learning approach to computational analysis of sound scenes and events, ranging from data capture and labeling process to development of algorithms
Includes descriptions of algorithms accompanied by a website from which software implementations can be downloaded, facilitating practical interaction with the techniques
Includes supplementary material: sn.pub/extras

39k Accesses
266 Citations
20 Altmetric

Buy it now

eBook USD 149.00

Price excludes VAT (USA)

Softcover Book USD 199.99

Price excludes VAT (USA)

Hardcover Book USD 199.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Learn about institutional subscriptions

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (14 chapters)

Front Matter

Pages i-x

PDF
Foundations
1. Front Matter
  
  Pages 1-1
  
  PDF
2. Introduction to Sound Scene and Event Analysis
  
  Tuomas Virtanen, Mark D. Plumbley, Dan Ellis
  
  Pages 3-12
3. The Machine Learning Approach for Analysis of Sound Scenes and Events
  
  Toni Heittola, Emre Çakır, Tuomas Virtanen
  
  Pages 13-40
4. Acoustics and Psychoacoustics of Sound Scenes and Events
  
  Guillaume Lemaitre, Nicolas Grimault, Clara Suied
  
  Pages 41-67
Core Methods
1. Front Matter
  
  Pages 69-69
  
  PDF
2. Acoustic Features for Environmental Sound Analysis
  
  Romain Serizel, Victor Bisot, Slim Essid, Gaël Richard
  
  Pages 71-101
3. Statistical Methods for Scene and Event Classification
  
  Brian McFee
  
  Pages 103-146
4. Datasets and Evaluation
  
  Annamaria Mesaros, Toni Heittola, Dan Ellis
  
  Pages 147-179
Advanced Methods
1. Front Matter
  
  Pages 181-181
  
  PDF
2. Everyday Sound Categorization
  
  Catherine Guastavino
  
  Pages 183-213
3. Approaches to Complex Sound Scene Analysis
  
  Emmanouil Benetos, Dan Stowell, Mark D. Plumbley
  
  Pages 215-242
4. Multiview Approaches to Event Detection and Scene Analysis
  
  Slim Essid, Sanjeel Parekh, Ngoc Q. K. Duong, Romain Serizel, Alexey Ozerov, Fabio Antonacci et al.
  
  Pages 243-276
Applications
1. Front Matter
  
  Pages 277-277
  
  PDF
2. Sound Sharing and Retrieval
  
  Frederic Font, Gerard Roma, Xavier Serra
  
  Pages 279-301
3. Computational Bioacoustic Scene Analysis
  
  Dan Stowell
  
  Pages 303-333
4. Audio Event Recognition in the Smart Home
  
  Sacha Krstulović
  
  Pages 335-371
5. Sound Analysis in Smart Cities
  
  Juan Pablo Bello, Charlie Mydlarz, Justin Salamon
  
  Pages 373-397
Perspectives
1. Front Matter
  
  Pages 399-399
  
  PDF
2. Future Perspective
  
  Dan Ellis, Tuomas Virtanen, Mark D. Plumbley, Bhiksha Raj
  
  Pages 401-415

About this book

This book presents computational methods for extracting the useful information from audio signals, collecting the state of the art in the field of sound event and scene analysis. The authors cover the entire procedure for developing such methods, ranging from data acquisition and labeling, through the design of taxonomies used in the systems, to signal processing methods for feature extraction and machine learning methods for sound recognition. The book also covers advanced techniques for dealing with environmental variation and multiple overlapping sound sources, and taking advantage of multiple microphones or other modalities. The book gives examples of usage scenarios in large media databases, acoustic monitoring, bioacoustics, and context-aware devices. Graphical illustrations of sound signals and their spectrographic representations are presented, as well as block diagrams and pseudocode of algorithms.

Keywords

Editors and Affiliations

Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland

Tuomas Virtanen
Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey, United Kingdom

Mark D. Plumbley
Google Inc., New York, USA

Dan Ellis

About the editors

Tuomas Virtanen is Professor at Laboratory of Signal Processing, Tampere University of Technology (TUT), Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from TUT in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on single-channel sound source separation using non-negative matrix factorization based techniques, and their application to noise-robust speech recognition, music content analysis and audio event detection. In addition to the above topics, his research interests include content analysis of audio signals in general and machine learning. He has authored more than 100 scientific publications on the above topics, which have been cited more than 5000 times. He has received the IEEE Signal Processing Society 2012 best paper award for his article "Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria" as well as three other best paper awards. He is an IEEE Senior Member, a member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society, Associate Editor of IEEE/ACM Transaction on Audio, Speech, and Language Processing and recipient of the ERC 2014 Starting Grant.

Mark Plumbley is Professor of Signal Processing at the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, in Guildford, UK. After receiving his Ph.D. degree in neural networks in 1991, he became a Lecturer at King's College London, before moving to Queen Mary University of London in 2002. He subsequently became Professor and Director of the Centre for Digital Music, before joining the University of Surrey in 2015. He is known for his work on analysis and processing of audio and music, using a wide range of signal processing techniques, includingindependent component analysis, sparse representations, and deep learning. He has also a keen to promote the importance of research software and data in audio and music research, including training researchers to follow the principles of reproducible research, and he led the 2013 D-CASE data challenge on Detection and Classification of Acoustic Scenes and Events. He currently leads two EU-funded research training networks in sparse representations, compressed sensing and machine sensing, and leads two major UK-funded projects on audio source separation and making sense of everyday sounds. He is a Fellow of the IET and IEEE.

Dan Ellis joined Google Inc., in 2015 as a Research Scientist after spending 15 years as a tenured professor in the Electrical Engineering department of Columbia University, where he founded and led the Laboratory for Recognition and Organization of Speech and Audio (LabROSA) which conducted research into all aspects of extracting information from sound. He is also an External Fellow of the International Computer Science Institute in Berkeley, CA, where he researched approaches to robust speech recognition. He is known for his contributions to Computational Auditory Scene Analysis, and for developing and transferring techniques between all different kinds of audio processing including speech, music, and environmental sounds. He has a long track record of supporting the community through public releases of code and data, including the Million Song Dataset of features and metadata for one million pop music tracks, which has become the standard large-scale research set in the Music Information Retrieval field.

Bibliographic Information

Book Title: Computational Analysis of Sound Scenes and Events
Editors: Tuomas Virtanen, Mark D. Plumbley, Dan Ellis
DOI: https://doi.org/10.1007/978-3-319-63450-0
Publisher: Springer Cham
eBook Packages: Engineering, Engineering (R0)
Copyright Information: Springer International Publishing AG 2018
Hardcover ISBN: 978-3-319-63449-4Published: 02 October 2017
Softcover ISBN: 978-3-319-87559-0Published: 18 August 2018
eBook ISBN: 978-3-319-63450-0Published: 21 September 2017
Edition Number: 1
Number of Pages: X, 422
Number of Illustrations: 27 b/w illustrations, 54 illustrations in colour
Topics: Signal, Image and Speech Processing, Engineering Acoustics, Computer Appl. in Social and Behavioral Sciences, User Interfaces and Human Computer Interaction

Publish with us

Policies and ethics

Editors:

Sections

Buy it now

Buying options

Other ways to access

Table of contents (14 chapters)

Front Matter

Foundations

Front Matter

Core Methods

Front Matter

Advanced Methods

Front Matter

Applications

Front Matter

Perspectives

Front Matter

About this book

Keywords

Editors and Affiliations

Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland

Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey, United Kingdom

Google Inc., New York, USA

About the editors

Bibliographic Information

Publish with us

Buy it now

Buying options

Other ways to access

Search

Navigation