Editors:

Michael W. Berry⁰,
Malu Castellanos¹

Michael W. Berry
1. Department of Computer Science, University of Tennessee, USA
View editor publications

You can also search for this editor in PubMed Google Scholar
Malu Castellanos
1. Hewlett-Packard Laboratories, Palo Alto, USA
View editor publications

You can also search for this editor in PubMed Google Scholar

Overview of current methods and software for text mining
Experts from academia and industry share their experiences in solving large-scale retrieval and classification problems
Highlights open research questions in document categorization and clustering, and trend detection
Describes new application problems in areas such as email surveillance and anomaly detection
Includes supplementary material: sn.pub/extras

26k Accesses
147 Citations

Buy it now

eBook USD 39.99

Price excludes VAT (USA)

Softcover Book USD 54.99

Price excludes VAT (USA)

Hardcover Book USD 54.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Learn about institutional subscriptions

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (12 chapters)

Front Matter

Pages i-xv

PDF
Clustering
1. Cluster-Preserving Dimension Reduction Methods for Document Classification
  
  Peg Howland, Haesun Park
  
  Pages 3-23
2. Automatic Discovery of SimilarWords
  
  Pierre Senellart, Vincent D. Blondel
  
  Pages 25-44
3. Principal Direction Divisive Partitioning with Kernels and k-Means Steering
  
  Dimitrios Zeimpekis, Efstratios Gallopoulos
  
  Pages 45-64
4. Hybrid Clustering with Divergences
  
  Jacob Kogan, Charles Nicholas, Mike Wiacek
  
  Pages 65-85
5. Text Clustering with Local Semantic Kernels
  
  Loulwah AlSumait, Carlotta Domeniconi
  
  Pages 87-105
Document Retrieval and Representation
1. Vector Space Models for Search and Cluster Mining
  
  Mei Kobayashi, Masaki Aono
  
  Pages 109-127
2. Applications of Semidefinite Programming in XML Document Classification
  
  Zhonghang Xia, Guangming Xing, Houduo Qi, Qi Li
  
  Pages 129-144
Email Surveillance and Filtering
1. Discussion Tracking in Enron Email Using PARAFAC
  
  Brett W. Bader, Michael W. Berry, Murray Browne
  
  Pages 147-163
2. Spam Filtering Based on Latent Semantic Indexing
  
  Wilfried N. Gansterer, Andreas G. K. Janecek, Robert Neumayer
  
  Pages 165-183
Anomaly Detection
1. A Probabilistic Model for Fast and Confident Categorization of Textual Documents
  
  Cyril Goutte
  
  Pages 187-202
2. Anomaly Detection Using Nonnegative Matrix Factorization
  
  Edward G. Allan, Michael R. Horvath, Christopher V. Kopek, Brian T. Lamb, Thomas S. Whaples, Michael W. Berry
  
  Pages 203-217
3. Document Representation and Quality of Text: An Analysis
  
  Mostafa Keikha, Narjes Sharif Razavian, Farhad Oroumchian, Hassan Seyed Razi
  
  Pages 219-232
Back Matter

Pages 233-240

PDF

About this book

As we enter the third decade of the World Wide Web (WWW), the textual revolution has seen a tremendous change in the availability of online information. Finding inf- mation for just about any need has never been more automatic—just a keystroke or mouseclick away. While the digitalization and creation of textual materials continues at light speed, the ability to navigate, mine, or casually browse through documents too numerous to read (or print) lags far behind. What approaches to text mining are available to ef?ciently organize, classify, label, and extract relevant information for today’s information-centric users? What algorithms and software should be used to detect emerging trends from both text streamsandarchives?Thesearejustafewoftheimportantquestionsaddressedatthe Text Mining Workshop held on April 28, 2007, in Minneapolis, MN. This workshop, the ?fth in a series of annual workshops on text mining, was held on the ?nal day of the Seventh SIAM International Conference on Data Mining (April 26–28, 2007). With close to 60 applied mathematicians and computer scientists representing universities, industrial corporations, and government laboratories, the workshop f- tured both invited and contributed talks on important topics such as the application of techniques of machine learning in conjunction with natural language processing, - formation extraction and algebraic/mathematical approaches to computational inf- mation retrieval. The workshop’s program also included an Anomaly Detection/Text Mining competition. NASA Ames Research Center of Moffett Field, CA, and SAS Institute Inc. of Cary, NC, sponsored the workshop.

Keywords

Editors and Affiliations

Department of Computer Science, University of Tennessee, USA

Michael W. Berry
Hewlett-Packard Laboratories, Palo Alto, USA

Malu Castellanos

Bibliographic Information

Book Title: Survey of Text Mining II
Book Subtitle: Clustering, Classification, and Retrieval
Editors: Michael W. Berry, Malu Castellanos
DOI: https://doi.org/10.1007/978-1-84800-046-9
Publisher: Springer London
eBook Packages: Computer Science, Computer Science (R0)
Hardcover ISBN: 978-1-84800-045-2Published: 11 March 2008
Softcover ISBN: 978-1-84996-713-6Published: 13 October 2010
eBook ISBN: 978-1-84800-046-9Published: 10 December 2007
Edition Number: 1
Number of Pages: XVI, 240
Topics: Data Structures and Information Theory, Natural Language Processing (NLP), Information Storage and Retrieval, Information Systems Applications (incl. Internet), Multimedia Information Systems, Applications of Mathematics

Publish with us

Policies and ethics

Editors:

Sections

Buy it now

Buying options

Other ways to access

Table of contents (12 chapters)

Front Matter

Clustering

Document Retrieval and Representation

Email Surveillance and Filtering

Anomaly Detection

Back Matter

About this book

Keywords

Editors and Affiliations

Department of Computer Science, University of Tennessee, USA

Hewlett-Packard Laboratories, Palo Alto, USA

Bibliographic Information

Publish with us

Buy it now

Buying options

Other ways to access

Search

Navigation