Skip to main content
  • Book
  • © 2013

Building and Using Comparable Corpora

  • A reference source for researchers and students coming to the field of comparable corpora
  • Identifies the state of the art in the field as well as future trends
  • Written by experts in the fields
  • Includes supplementary material: sn.pub/extras

Buy it now

Buying options

eBook USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (17 chapters)

  1. Front Matter

    Pages i-xii
  2. Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora

    • Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum
    Pages 1-17
  3. Compiling and Measuring Comparable Corpora

    1. Front Matter

      Pages 19-19
    2. Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary Making

      • Antton Gurrutxaga, Igor Leturia, Xabier Saralegi, Iñaki San Vicente
      Pages 51-75
    3. Methods for Collection and Evaluation of Comparable Documents

      • Monica Lestari Paramita, David Guthrie, Evangelos Kanoulas, Rob Gaizauskas, Paul Clough, Mark Sanderson
      Pages 93-112
    4. Statistical Corpus and Language Comparison on Comparable Corpora

      • Thomas Eckart, Uwe Quasthoff
      Pages 151-165
    5. Comparable Multilingual Patents as Large-Scale Parallel Corpora

      • Bin Lu, Ka Po Chow, Benjamin K. Tsou
      Pages 167-187
  4. Using Comparable Corpora

    1. Front Matter

      Pages 189-189
    2. Extracting Parallel Phrases from Comparable Data

      • Sanjika Hewavitharana, Stephan Vogel
      Pages 191-204
    3. Exploiting Comparable Corpora

      • Dragos Stefan Munteanu, Daniel Marcu
      Pages 205-222
    4. Paraphrase Detection in Monolingual Specialized/Lay Comparable Corpora

      • Louise Deléger, Bruno Cartoni, Pierre Zweigenbaum
      Pages 223-241
    5. Bilingual Terminology Mining from Language for Special Purposes Comparable Corpora

      • Emmanuel Morin, Béatrice Daille, Emmanuel Prochasson
      Pages 265-284
    6. Old Needs, New Solutions: Comparable Corpora for Language Professionals

      • Silvia Bernardini, Adriano Ferraresi
      Pages 303-319

About this book

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field.

The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Reviews

“I would like to recommend ‘Building and Using Comparable … to those who are working with or are interested in multilingual and monolingual comparable corpora. … it is easy to say that the notion of comparable corpora was not only visionary, long-sighted, and productive. It is also easy to say that this volume remains the optimal starting point for any research or for any applications in Language Technology leveraging on comparable corpora.” (Marina Santini, forum.santini.se, February, 2017)

Editors and Affiliations

  • Centre for Translation Studies, University of Leeds, Leeds, United Kingdom

    Serge Sharoff

  • University of Mainz, Mainz, Germany

    Reinhard Rapp

  • Université de Paris-Sud LIMSI-CNRS, Orsay, France

    Pierre Zweigenbaum

  • Electronic & Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, People's Republic of China

    Pascale Fung

Bibliographic Information

  • Book Title: Building and Using Comparable Corpora

  • Editors: Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum, Pascale Fung

  • DOI: https://doi.org/10.1007/978-3-642-20128-8

  • Publisher: Springer Berlin, Heidelberg

  • eBook Packages: Computer Science, Computer Science (R0)

  • Copyright Information: Springer-Verlag Berlin Heidelberg 2013

  • Hardcover ISBN: 978-3-642-20127-1Published: 07 January 2014

  • Softcover ISBN: 978-3-662-52006-2Published: 23 August 2016

  • eBook ISBN: 978-3-642-20128-8Published: 13 December 2013

  • Edition Number: 1

  • Number of Pages: XII, 335

  • Number of Illustrations: 56 b/w illustrations, 14 illustrations in colour

  • Topics: Natural Language Processing (NLP), Computational Linguistics, Information Systems Applications (incl. Internet)

Buy it now

Buying options

eBook USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access