Skip to main content

Introduction to HPC with MPI for Data Science

  • Textbook
  • © 2016

Overview

  • Contains numerous exercises and a test exam
  • Features material that has been used and tested with students
  • Provides additional material, including source C++/MPI codes and slides for each chapter, on an accompanying website
  • Includes supplementary material: sn.pub/extras

Part of the book series: Undergraduate Topics in Computer Science (UTICS)

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (11 chapters)

  1. High Performance Computing (HPC) with the Message Passing Interface (MPI)

  2. High Performance Computing (HPC) for Data Science (DS)

Keywords

About this book

This gentle introduction to High Performance Computing (HPC) for Data Science using the Message Passing Interface (MPI) standard has been designed as a first course for undergraduates on parallel programming on distributed memory models, and requires only basic programming notions.

Divided into two parts the first part covers high performance computing using C++ with the Message Passing Interface (MPI) standard followed by a second part providing high-performance data analytics on computer clusters.

In the first part, the fundamental notions of blocking versus non-blocking point-to-point communications, global communications (like broadcast or scatter) and collaborative computations (reduce), with Amdalh and Gustafson speed-up laws are described before addressing parallel sorting and parallel linear algebra on computer clusters. The common ring, torus and hypercube topologies of clusters are then explained and global communication procedures on these topologies are studied. This first part closes with the MapReduce (MR) model of computation well-suited to processing big data using the MPI framework.

In the second part, the book focuses on high-performance data analytics. Flat and hierarchical clustering algorithms are introduced for data exploration along with how to program these algorithms on computer clusters, followed by machine learning classification, and an introduction to graph analytics. This part closes with a concise introduction to data core-sets that let big data problems be amenable to tiny data problems.

Exercises are included at the end of each chapter in order for students to practice the concepts learned, and a final section contains an overall exam which allows them to evaluate how well they have assimilated the material covered in the book.

Authors and Affiliations

  • Bâtiment Alan Turing, CS35003, École Polytechnique, Palaiseau, France

    Frank Nielsen

About the author

Frank Nielsen is a Professor at École Polytechnique in France where he teaches graduate (vision/graphics) and undergraduate (Java/algorithms),and a senior researcher at Sony Computer Science Laboratories Inc. His research includes Computational information geometry for imaging and learning and he is the author of 3 textbooks and 3 edited books. He is also on the Editorial Board for the Springer Journal of Mathematical Imaging and Vision.





Bibliographic Information

  • Book Title: Introduction to HPC with MPI for Data Science

  • Authors: Frank Nielsen

  • Series Title: Undergraduate Topics in Computer Science

  • DOI: https://doi.org/10.1007/978-3-319-21903-5

  • Publisher: Springer Cham

  • eBook Packages: Computer Science, Computer Science (R0)

  • Copyright Information: Springer International Publishing Switzerland 2016

  • Softcover ISBN: 978-3-319-21902-8Published: 11 February 2016

  • eBook ISBN: 978-3-319-21903-5Published: 03 February 2016

  • Series ISSN: 1863-7310

  • Series E-ISSN: 2197-1781

  • Edition Number: 1

  • Number of Pages: XXXIII, 282

  • Number of Illustrations: 101 illustrations in colour

  • Topics: Programming Techniques

Publish with us