Skip to main content
  • Book
  • © 2008

Survey of Text Mining II

Clustering, Classification, and Retrieval

  • Overview of current methods and software for text mining

  • Experts from academia and industry share their experiences in solving large-scale retrieval and classification problems

  • Highlights open research questions in document categorization and clustering, and trend detection

  • Describes new application problems in areas such as email surveillance and anomaly detection

  • Includes supplementary material: sn.pub/extras

Buy it now

Buying options

eBook USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (12 chapters)

  1. Front Matter

    Pages i-xv
  2. Clustering

    1. Automatic Discovery of SimilarWords

      • Pierre Senellart, Vincent D. Blondel
      Pages 25-44
    2. Principal Direction Divisive Partitioning with Kernels and k-Means Steering

      • Dimitrios Zeimpekis, Efstratios Gallopoulos
      Pages 45-64
    3. Hybrid Clustering with Divergences

      • Jacob Kogan, Charles Nicholas, Mike Wiacek
      Pages 65-85
    4. Text Clustering with Local Semantic Kernels

      • Loulwah AlSumait, Carlotta Domeniconi
      Pages 87-105
  3. Document Retrieval and Representation

    1. Vector Space Models for Search and Cluster Mining

      • Mei Kobayashi, Masaki Aono
      Pages 109-127
    2. Applications of Semidefinite Programming in XML Document Classification

      • Zhonghang Xia, Guangming Xing, Houduo Qi, Qi Li
      Pages 129-144
  4. Email Surveillance and Filtering

    1. Discussion Tracking in Enron Email Using PARAFAC

      • Brett W. Bader, Michael W. Berry, Murray Browne
      Pages 147-163
    2. Spam Filtering Based on Latent Semantic Indexing

      • Wilfried N. Gansterer, Andreas G. K. Janecek, Robert Neumayer
      Pages 165-183
  5. Anomaly Detection

    1. Anomaly Detection Using Nonnegative Matrix Factorization

      • Edward G. Allan, Michael R. Horvath, Christopher V. Kopek, Brian T. Lamb, Thomas S. Whaples, Michael W. Berry
      Pages 203-217
    2. Document Representation and Quality of Text: An Analysis

      • Mostafa Keikha, Narjes Sharif Razavian, Farhad Oroumchian, Hassan Seyed Razi
      Pages 219-232
  6. Back Matter

    Pages 233-240

About this book

As we enter the third decade of the World Wide Web (WWW), the textual revolution has seen a tremendous change in the availability of online information. Finding inf- mation for just about any need has never been more automatic—just a keystroke or mouseclick away. While the digitalization and creation of textual materials continues at light speed, the ability to navigate, mine, or casually browse through documents too numerous to read (or print) lags far behind. What approaches to text mining are available to ef?ciently organize, classify, label, and extract relevant information for today’s information-centric users? What algorithms and software should be used to detect emerging trends from both text streamsandarchives?Thesearejustafewoftheimportantquestionsaddressedatthe Text Mining Workshop held on April 28, 2007, in Minneapolis, MN. This workshop, the ?fth in a series of annual workshops on text mining, was held on the ?nal day of the Seventh SIAM International Conference on Data Mining (April 26–28, 2007). With close to 60 applied mathematicians and computer scientists representing universities, industrial corporations, and government laboratories, the workshop f- tured both invited and contributed talks on important topics such as the application of techniques of machine learning in conjunction with natural language processing, - formation extraction and algebraic/mathematical approaches to computational inf- mation retrieval. The workshop’s program also included an Anomaly Detection/Text Mining competition. NASA Ames Research Center of Moffett Field, CA, and SAS Institute Inc. of Cary, NC, sponsored the workshop.

Editors and Affiliations

  • Department of Computer Science, University of Tennessee, USA

    Michael W. Berry

  • Hewlett-Packard Laboratories, Palo Alto, USA

    Malu Castellanos

Bibliographic Information

Buy it now

Buying options

eBook USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access