Logo - springer
Slogan - springer

Computer Science - Theoretical Computer Science | Core Concepts in Data Analysis: Summarization, Correlation and Visualization

Core Concepts in Data Analysis: Summarization, Correlation and Visualization

Mirkin, Boris

2011, XX, 390p. 129 illus..

Available Formats:

Springer eBooks may be purchased by end-customers only and are sold without copy protection (DRM free). Instead, all eBooks include personalized watermarks. This means you can read the Springer eBooks across numerous devices such as Laptops, eReaders, and tablets.

You can pay for Springer eBooks with Visa, Mastercard, American Express or Paypal.

After the purchase you can directly download the eBook file or read it online in our Springer eBook Reader. Furthermore your eBook will be stored in your MySpringer account. So you can always re-download your eBooks.


(net) price for USA

ISBN 978-0-85729-287-2

digitally watermarked, no DRM

Included Format: PDF and EPUB

download immediately after purchase

learn more about Springer eBooks

add to marked items


Softcover (also known as softback) version.

You can pay for Springer Books with Visa, Mastercard, American Express or Paypal.

Standard shipping is free of charge for individual customers.


(net) price for USA

ISBN 978-0-85729-286-5

free shipping for individuals worldwide

usually dispatched within 3 to 5 business days

add to marked items

  • Enhances theoretical knowledge of data analysis
  • Gives readers a structure for learning materials
  • Explores methodical innovations

Core Concepts in Data Analysis: Summarization, Correlation and Visualization provides in-depth descriptions of those data analysis approaches that either summarize data (principal component analysis and clustering, including hierarchical and network clustering) or correlate different aspects of data (decision trees, linear rules, neuron networks, and Bayes rule).

Boris Mirkin takes an unconventional approach and introduces the concept of multivariate data summarization as a counterpart to conventional machine learning prediction schemes, utilizing techniques from statistics, data analysis, data mining, machine learning, computational intelligence, and information retrieval.

Innovations following from his in-depth analysis of the models underlying summarization techniques are introduced, and applied to challenging issues such as the number of clusters, mixed scale data standardization, interpretation of the solutions, as well as relations between seemingly unrelated concepts: goodness-of-fit functions for classification trees and data standardization, spectral clustering and additive clustering, correlation and visualization of contingency data.  

The mathematical detail is encapsulated in the so-called “formulation” parts, whereas most material is delivered through “presentation” parts that explain the methods by applying them to small real-world data sets; concise “computation” parts inform of the algorithmic and coding issues.

Four layers of active learning and self-study exercises are provided: worked examples, case studies, projects and questions.     



Content Level » Upper undergraduate

Keywords » Clustering - Data Analysis - K-means - Principal component analysis - Visualization

Related subjects » Artificial Intelligence - Image Processing - Theoretical Computer Science

Table of contents 

Introduction.-Summarization and Correlation-Two Main Goals of Data Analysis.-Case Study Problems.-An Account of Data Visualization.-Summary.-1D Analysis: Summarization and Visualisation of a Single Feature.-Quantitative Feature: Distribution and Histogram.-Further Summarization:Centers and Spreads.-Binary and Categorical Features.-Modeling Uncertainty: Intervals and Fuzzy Sets.-Summary.-2D Analysis: Correlation and Visualition of Two Features.-General.-Two Quantitative Features Case.-Linear Regression: Formulation.-Linear Regression: Computation.-Mixed Scale Case: Nominal Feature Versus a Quantitative One.-Two Nominal Features Case.-Summary.-Learning Multivariate Correlations in Data.-General: Decision Rules, Fitting Criteria and Learning Protocols.-Naive Bayes Approach.-Linear Regression.-Linear Discrimination and SVM.-Decision Trees.-Learning Correlation with Neuron Networks.-Summary.-Principal Component Analysis and SVD.-Decoder Based Data Summarization.-Principal Component Analysis: Model, Method, Usage.-Application: Latent Semantic Analysis.-Application: Correspondence Analysis.-Summary.-K-Means and Related Clustering Methods.-General.-K-Means Clustering.-Cluster Interpretation Aids.-Extensions of K-Means to Different Cluster Structures.-Summary.-Hierarchial Clustering.-General.-Agglomerative Clustering and Ward's Criterion.-Divisive and Conceptual Clustering.-Single Linkage Clustering, Connected Components and Maximum Spanning Tree.-Summary.-Approximate and Spectral Clustering for Network and Affinity Data.-One Cluster Summary Similarity with Background Subtracted.-Two Cluster Case: Cut, Normalized Cut and Spectral Clustering.-Additive Clusters.-Summary.-Appendix

Popular Content within this publication 



Read this Book on Springerlink

Services for this book

New Book Alert

Get alerted on new Springer publications in the subject area of Discrete Mathematics in Computer Science.