Read While You Wait - Get immediate ebook access, if available*, when you order a print book

Theory and Applications of Natural Language Processing

Using Comparable Corpora for Under-Resourced Areas of Machine Translation

Editors: Skadina, I., Gaizauskas, R., Babych, B., Ljubesic, N., Tufis, D., Vasiljevs, A. (Eds.)

Free Preview
  • Describes a step-by-step method for collecting comparable corpora and processing it for usage in machine translation
  • Demonstrates how data from comparable corpora can improve the quality of machine translation
  • Proposes novel methods for measuring the comparability of multilingual corpora
  • Describes algorithms and techniques for alignment and extraction of lexical and terminological data from comparable corpora in order to provide training and customization data for machine translation
see more benefits

Buy this book

eBook $119.00
price for USA in USD
  • ISBN 978-3-319-99004-0
  • Digitally watermarked, DRM-free
  • Included format: EPUB, PDF
  • ebooks can be used on all reading devices
  • Immediate eBook download after purchase
Hardcover $159.99
price for USA in USD
  • ISBN 978-3-319-99003-3
  • Free shipping for individuals worldwide
  • Immediate ebook access, if available*, with your print order
  • Usually ready to be dispatched within 3 to 5 business days.
About this book

This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains.

The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.



About the authors

Prof. Inguna Skadiņa has been working on language technologies for over 25 years. Her research interests are in machine translation, human-computer interaction, and language resources and tools for under-resourced languages. She has coordinated and participated in many national and international projects related to human language technologies, and has authored or co-authored more than 60 peer-reviewed research papers.

Bogdan Babych is an Associate Professor of Translation Studies at the University of Leeds, UK. He holds a PhD in machine translation and in Ukrainian linguistics. Dr. Babych was a coordinator of the EU FP7 Marie Curie project HyghTra, and received a Leverhulme Early Career Fellowship for his project Translation Strategies in Comparable Corpora. He previously worked as a computational linguist at L&H Speech Products, Belgium.

Robert Gaizauskas is a Professor of Computer Science and head of the Natural Language Processing group, Department of Computer Science, University of Sheffield, UK. His research interests are in computational semantics, information extraction, text summarization and machine translation. He holds a DPhil from the University of Sussex, UK (1992), and has published more than 150 papers in peer-reviewed journals and conference proceedings.

Nikola Ljubešić is an Assistant Professor at the Department of Information Science, University of Zagreb, Croatia, and researcher at the "Jožef Stefan" Institute in Ljubljana, Slovenia. His main research interests are in language technologies for South Slavic languages, linguistic processing of non-standard texts, author profiling and social media analytics.

Prof. Dan Tufiș, director of RACAI and full member of the Romanian Academy, has been active in computational and corpus linguistics for more than 30 years. His expertise is in tagging, word alignment, multilingual WSD, SMT, QA in open domains, lexical ontologies, language resource annotation and encoding. He has authored or co-authored more than 250 peer-reviewed papers, book chapters and books.

Andrejs Vasiļjevs is a co-founder and chairman of the board of Tilde, a leading European language technology and localization company. His expertise is in terminology management, machine translation and human computer interaction. He initiated and coordinated the ACCURAT project as well as several other international research and innovation projects. He holds a PhD in computer sciences from the University of Latvia and a Dr.h. from the Latvian Academy of Sciences.

Table of contents (8 chapters)

Table of contents (8 chapters)

Buy this book

eBook $119.00
price for USA in USD
  • ISBN 978-3-319-99004-0
  • Digitally watermarked, DRM-free
  • Included format: EPUB, PDF
  • ebooks can be used on all reading devices
  • Immediate eBook download after purchase
Hardcover $159.99
price for USA in USD
  • ISBN 978-3-319-99003-3
  • Free shipping for individuals worldwide
  • Immediate ebook access, if available*, with your print order
  • Usually ready to be dispatched within 3 to 5 business days.
Loading...

Recommended for you

Loading...

Bibliographic Information

Bibliographic Information
Book Title
Using Comparable Corpora for Under-Resourced Areas of Machine Translation
Editors
  • Inguna Skadina
  • Robert Gaizauskas
  • Bogdan Babych
  • Nikola Ljubesic
  • Dan Tufis
  • Andrejs Vasiljevs
Series Title
Theory and Applications of Natural Language Processing
Copyright
2019
Publisher
Springer International Publishing
Copyright Holder
Springer Nature Switzerland AG
eBook ISBN
978-3-319-99004-0
DOI
10.1007/978-3-319-99004-0
Hardcover ISBN
978-3-319-99003-3
Series ISSN
2192-032X
Edition Number
1
Number of Pages
VI, 323
Number of Illustrations
24 b/w illustrations, 39 illustrations in colour
Topics

*immediately available upon purchase as print book shipments may be delayed due to the COVID-19 crisis. ebook access is temporary and does not include ownership of the ebook. Only valid for books with an ebook version. Springer Reference Works and instructor copies are not included.