Skip to main content
Book cover

Text Mining

Predictive Methods for Analyzing Unstructured Information

  • Book
  • © 2005

Overview

  • Provides an authoritative, comprehensive survey of the concepts, principles, and methods of text mining (the search and retrieval of nonnumeric data), which is becoming increasingly critical at companies and organizations as they attempt to fully utilize their document/textual databases
  • Includes supplementary material: sn.pub/extras

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (8 chapters)

Keywords

About this book

Data mining is a mature technology. The prediction problem, looking for predictive patterns in data, has been widely studied. Strong me- ods are available to the practitioner. These methods process structured numerical information, where uniform measurements are taken over a sample of data. Text is often described as unstructured information. So, it would seem, text and numerical data are different, requiring different methods. Or are they? In our view, a prediction problem can be solved by the same methods, whether the data are structured - merical measurements or unstructured text. Text and documents can be transformed into measured values, such as the presence or absence of words, and the same methods that have proven successful for pred- tive data mining can be applied to text. Yet, there are key differences. Evaluation techniques must be adapted to the chronological order of publication and to alternative measures of error. Because the data are documents, more specialized analytical methods may be preferred for text. Moreover, the methods must be modi?ed to accommodate very high dimensions: tens of thousands of words and documents. Still, the central themes are similar.

Authors and Affiliations

  • TJ Watson Labs, IBM Research, Yorktown Heights, USA

    Sholom M. Weiss, Tong Zhang, Fred J. Damerau

  • School of Computer Science and Engineering, University of New South Wales, Sydney, Australia

    Nitin Indurkhya

Bibliographic Information

Publish with us