Skip to main content
  • Book
  • © 2010

Guide to OCR for Indic Scripts

Document Recognition and Retrieval

  • First comprehensive book on the topic of Indic Script OCRs

Part of the book series: Advances in Computer Vision and Pattern Recognition (ACVPR)

Buy it now

Buying options

eBook USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (16 chapters)

  1. Front Matter

    Pages i-xxi
  2. Section: Recognition of Indic scripts

    1. Front Matter

      Pages 1-1
    2. Building Data Sets for Indian Language OCR Research

      • C.V. Jawahar, Anand Kumar, A. Phaneendra, K.J. Jinesh
      Pages 3-25
    3. Progress in Gujarati Document Processing and Character Recognition

      • Jignesh Dholakia, Atul Negi, S. Rama Mohan
      Pages 73-95
    4. Design of a Bilingual Kannada–English OCR

      • R.S. Umesh, Peeta Basa Pati, A.G. Ramakrishnan
      Pages 97-124
    5. Recognition of Malayalam Documents

      • N.V. Neeba, Anoop Namboodiri, C.V. Jawahar, P.J. Narayanan
      Pages 125-146
    6. A Complete OCR System for Tamil Magazine Documents

      • Aparna Kokku, Srinivasa Chakravarthy
      Pages 147-162
    7. Experiments on Urdu Text Recognition

      • Omar Mukhtar, Srirangaraj Setlur, Venu Govindaraju
      Pages 163-171
    8. The BBN Byblos Hindi OCR System

      • Prem Natarajan, Ehry MacRostie, Michael Decerbo
      Pages 173-180
    9. Generalization of Hindi OCR Using Adaptive Segmentation and Font Files

      • Mudit Agrawal, Huanfeng Ma, David Doermann
      Pages 181-207
    10. Online Handwriting Recognition for Indic Scripts

      • A. Bharath, Sriganesh Madhvanath
      Pages 209-234
  3. Section: Retrieval of Indic documents

    1. Front Matter

      Pages 235-235
    2. Enhancing Access to Primary Cultural Heritage Materials of India

      • Peter M. Scharf, Malcolm Hyman
      Pages 237-247
    3. Digital Image Enhancement of Indic Historical Manuscripts

      • Zhixin Shi, Srirangaraj Setlur, Venu Govindaraju
      Pages 249-267
    4. GFG-Based Compression and Retrieval of Document Images in Indian Scripts

      • Gaurav Harit, Santanu Chaudhury, Ritu Garg
      Pages 269-284
    5. Word Spotting for Indic Documents to Facilitate Retrieval

      • Anurag Bhardwaj, Srirangaraj Setlur, Venu Govindaraju
      Pages 285-299
    6. Indian Language Information Retrieval

      • Prasenjit Majumder, Mandar Mitra
      Pages 301-314
  4. Back Matter

    Pages 315-325

About this book

Optical Character Recognition (OCR) is a key enabling technology critical to creating indexed, digital library content, and it is especially valuable for Indic scripts, for which there has been very little digital access.

Indic scripts, the ancient Brahmi scripts prevalent in the Indian subcontinent, present some challenges for OCR that are different from those faced with Latin and Oriental scripts. But properly utilized, OCR will help to make Indic digital archives practically accessible to researchers and lay users alike by creating searchable indexes and machine-readable text repositories.

This unique guide/reference is the very first comprehensive book on the subject of OCR for Indic scripts, providing an overview of the state-of-the-art research in this field as well as other issues related to facilitating query and retrieval of Indic documents from digital libraries. All major research groups working in this area are represented in this book, which is divided into sections on recognition of Indic scripts and retrieval of Indic documents.

Topics and features:

  • Contains contributions from the leading researchers in the field
  • Discusses data set creation for OCR development
  • Describes OCR systems that cover eight different scripts: Bangla, Devanagari, Gurmukhi, Gujarati, Kannada, Malayalam, Tamil, and Urdu (Perso-Arabic)
  • Explores the challenges of Indic script handwriting recognition in the online domain
  • Examines the development of handwriting-based text input systems
  • Describes ongoing work to increase access to Indian cultural heritage materials
  • Provides a section on the enhancement of text and images obtained from historical Indic palm leaf manuscripts
  • Investigates different techniques for word spotting in Indic scripts
  • Reviews mono-lingual and cross-lingual information retrieval in Indic languages

This is an excellent reference for researchers and graduate students studying OCR technology and methodologies. This volume will contribute to opening up the rich Indian cultural heritage embodied in millions of ancient and contemporary documents spanning topics such as science, literature, medicine, astronomy, mathematics and philosophy.

Venu Govindaraju FIEEE FIAPR, is a Distinguished Professor of Computer Science and Engineering at the University at Buffalo. He has over 20 years of research experience in pattern recognition, information retrieval and biometrics. His seminal work on handwriting recognition was at the core of the first handwritten address interpretation system used by the U.S. Postal Service.

Srirangaraj Setlur SMIEEE, is a Principal Research Scientist at the University at Buffalo. He has over 15 years of research experience in pattern recognition that includes NSF sponsored work on multilingual OCR technologies for digital libraries and other applications. His work on postal automation has led to technology adopted by the U.S. Postal Service, and Royal Mail in the U.K.

Editors and Affiliations

  • Analysis & Recognition (CEDAR), Center of Excellence for Document, Amherst, U.S.A.

    Venu Govindaraju, Srirangaraj (Ranga) Setlur

Bibliographic Information

  • Book Title: Guide to OCR for Indic Scripts

  • Book Subtitle: Document Recognition and Retrieval

  • Editors: Venu Govindaraju, Srirangaraj (Ranga) Setlur

  • Series Title: Advances in Computer Vision and Pattern Recognition

  • DOI: https://doi.org/10.1007/978-1-84800-330-9

  • Publisher: Springer London

  • eBook Packages: Computer Science, Computer Science (R0)

  • Copyright Information: Springer-Verlag London 2010

  • Hardcover ISBN: 978-1-84800-329-3Published: 09 October 2009

  • Softcover ISBN: 978-1-4471-2518-1Published: 14 March 2012

  • eBook ISBN: 978-1-84800-330-9Published: 25 September 2009

  • Series ISSN: 2191-6586

  • Series E-ISSN: 2191-6594

  • Edition Number: 1

  • Number of Pages: XXI, 325

  • Number of Illustrations: 150 b/w illustrations, 11 illustrations in colour

  • Topics: Natural Language Processing (NLP)

Buy it now

Buying options

eBook USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access