Skip to main content

Guide to High Performance Distributed Computing

Case Studies with Hadoop, Scalding and Spark

  • Textbook
  • © 2015

Overview

  • Provides a guide to the distributed computing technologies of Hadoop and Spark, from the perspective of industry practitioners
  • Supports the theory with case studies taken from a range of disciplines, including data mining, machine learning, graph processing and image processing
  • Supplies working source code to aid understanding through step-by-step implementation
  • Includes supplementary material: sn.pub/extras

Part of the book series: Computer Communications and Networks (CCN)

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (8 chapters)

  1. Programming Fundamentals of High Performance Distributed Computing

  2. Case Studies Using Hadoop, Scalding and Spark

Keywords

About this book

This timely text/reference describes the development and implementation of large-scale distributed processing systems using open source tools and technologies. Comprehensive in scope, the book presents state-of-the-art material on building high performance distributed computing systems, providing practical guidance and best practices as well as describing theoretical software frameworks. Features: describes the fundamentals of building scalable software systems for large-scale data processing in the new paradigm of high performance distributed computing; presents an overview of the Hadoop ecosystem, followed by step-by-step instruction on its installation, programming and execution; Reviews the basics of Spark, including resilient distributed datasets, and examines Hadoop streaming and working with Scalding; Provides detailed case studies on approaches to clustering, data classification and regression analysis; Explains the process of creating a working recommender system using Scalding and Spark.

Authors and Affiliations

  • M.S. Ramaiah Institute of Technology, Bangalore, India

    K.G. Srinivasa, Anil Kumar Muppalla

Bibliographic Information

Publish with us