Skip to main content
  • Textbook
  • © 2019

Utility and Application of Language Corpora

  • Trains readers in the application of corpora in various domains of language technology and processing
  • Includes numerous diagrams and flowcharts for easy comprehension of technical discussions
  • Presents discussions in simple English to appeal to non-native English readers

Buy it now

Buying options

eBook USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (15 chapters)

  1. Front Matter

    Pages i-xxx
  2. Issues in Text Corpus Generation

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 1-16
  3. Process of Text Corpus Generation

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 17-34
  4. Corpus Editing and Text Normalization

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 35-56
  5. Statistical Studies on Language Corpus

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 57-71
  6. Processing Texts in a Corpus

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 73-90
  7. Corpus as a Primary Resource for ELT

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 91-103
  8. Corpus as a Secondary Resource for ELT

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 105-119
  9. Corpus and Dictionary Making

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 121-138
  10. Corpus and Dialect Study

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 139-153
  11. Corpus and Word Sense Disambiguation

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 155-172
  12. Corpus and Technical TermBank

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 173-191
  13. Corpus and Machine Translation

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 193-217
  14. Corpus and Some Other Domains

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 219-236
  15. Language Corpora: The Indian Scenario

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 237-249
  16. Corpus and Future Indian Needs

    • Niladri Sekhar Dash, L. Ramamoorthy
    Pages 251-266
  17. Back Matter

    Pages 267-290

About this book

This book discusses some of the basic issues relating to corpus generation and the methods normally used to generate a corpus. Since corpus-related research goes beyond corpus generation, the book also addresses other major topics connected with the use and application of language corpora, namely, corpus readiness in the context of corpus sanitation and pre-editing of corpus texts; the application of statistical methods; and various text processing techniques. Importantly, it explores how corpora can be used as a primary or secondary resource in English language teaching, in creating dictionaries, in word sense disambiguation, in various language technologies, and in other branches of linguistics. Lastly, the book sheds light on the status quo of corpus generation in Indian languages and identifies current and future needs.

Discussing various technical issues in the field in a lucid manner, providing extensive new diagrams and charts for easy comprehension, and using simplified English, the book is an ideal resource for non-native English readers. Written by academics with many years of experience teaching and researching corpus linguistics, its focus on Indian languages and on English corpora makes it applicable to graduate and postgraduate students of applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.



Authors and Affiliations

  • Indian Statistical Institute, Linguistic Research Unit, Kolkata, India

    Niladri Sekhar Dash

  • Linguistic Data Consortium-Indian Languages, Central Institute of Indian Languages, Mysore, India

    L. Ramamoorthy

About the authors

Niladri Sekhar Dash, PhD, is an associate professor at the Linguistic Research Unit of the Indian Statistical Institute, Kolkata, where his interests include corpus linguistics, language technology, natural language processing, language documentation and digitization, computational lexicography, computer assisted language teaching, and manual and machine translation for over two decades. He has published 15 research monographs and 160 research papers in peer-reviewed national and international journals, anthologies, and conference proceedings. He has delivered lectures and taught courses as an invited scholar at more than 30 universities and institutes in India and abroad, and has acted as a consultant for several organizations working in the field of Language Technology and Natural Language Processing. Dr. Dash is the principal investigator for 5 language technology projects funded by the Government of India and the Indian Statistical Institute, Kolkata. He is the editor-in-chief of the Journal of Advanced Linguistic Studies—an international peer-reviewed journal; and editorial board member of 5 international journals. He is a member of several linguistics associations across the globe and a regular PhD thesis adjudicator for several Indian universities. Dr. Dash is currently working on a digital pronunciation dictionary for Bangla, Hindi-Bangla parallel translation corpus generation, endangered language documentation and digitization, POS tagging and chunking, word sense disambiguation, manual and machine translation, and computer-assisted language teaching.

L. Ramamoorthy, PhD, is the head of the Linguistic Data Consortium for Indian Languages (LDC-IL) at the Central Institute of Indian Languages (CIIL), Mysuru, Ministry of Human Resource Development, Government of India. He is one of the leading corpus linguists in India, and is in charge of the Corpus Development Project for Indian Languages at the CIIL. Under his leadership, more than 30scholars have been working on this mega project. He is a member/active participant of several corpus-oriented projects in India. He has conducted numerous workshops on computational and corpus linguistics in various universities and colleges in India. He has published 7 research monographs, edited 8 volumes, and published/presented 140 research papers at national and international seminars and conferences. He has guided more than 15 PhD scholars and trained school teachers, college and university teachers in and outside India in language technology, linguistics, and teaching methods. He was also the director of the Pondicherry Institute of Linguistics and Culture (for over four years), editor-in-chief of the PILC Journal of Dravidic Studies, and co-editor for Languages in India (e-Journal).

Bibliographic Information

  • Book Title: Utility and Application of Language Corpora

  • Authors: Niladri Sekhar Dash, L. Ramamoorthy

  • DOI: https://doi.org/10.1007/978-981-13-1801-6

  • Publisher: Springer Singapore

  • eBook Packages: Social Sciences, Social Sciences (R0)

  • Copyright Information: Springer Nature Singapore Pte Ltd. 2019

  • Hardcover ISBN: 978-981-13-1800-9Published: 22 August 2018

  • Softcover ISBN: 978-981-13-4688-0Published: 23 December 2018

  • eBook ISBN: 978-981-13-1801-6Published: 13 August 2018

  • Edition Number: 1

  • Number of Pages: XXX, 290

  • Number of Illustrations: 39 b/w illustrations, 1 illustrations in colour

  • Topics: Corpus Linguistics, Natural Language Processing (NLP), Learning & Instruction

Buy it now

Buying options

eBook USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access