Name: Data Science Solutions with Python
ISBN: 978-1-4842-7762-1

Overview

Authors:

Tshepo Chris Nokeri ⁰

Tshepo Chris Nokeri
1. Pretoria, South Africa
View author publications

You can also search for this author in PubMed Google Scholar

Explains techniques for integrating frameworks for high model performance
Presents a hybrid approach for rapid prototyping models, deploying and scaling them
Bridges the gap between machine and deep learning frameworks

36k Accesses
8 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 29.99

Price excludes VAT (USA)

Softcover Book USD 37.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (10 chapters)

Front Matter

Pages i-xvi

Download chapter PDF
Exploring Machine Learning
- Tshepo Chris Nokeri
Pages 1-5
Big Data, Machine Learning, and Deep Learning Frameworks
- Tshepo Chris Nokeri
Pages 7-14
Linear Modeling with Scikit-Learn, PySpark, and H2O
- Tshepo Chris Nokeri
Pages 15-28
Survival Analysis withPySpark and Lifelines
- Tshepo Chris Nokeri
Pages 29-37
Nonlinear Modeling With Scikit-Learn, PySpark, and H2O
- Tshepo Chris Nokeri
Pages 39-57
Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O
- Tshepo Chris Nokeri
Pages 59-74
Neural Networks with Scikit-Learn, Keras, and H2O
- Tshepo Chris Nokeri
Pages 75-88
Cluster Analysis with Scikit-Learn, PySpark, and H2O
- Tshepo Chris Nokeri
Pages 89-99
Principal Component Analysis with Scikit-Learn, PySpark, and H2O
- Tshepo Chris Nokeri
Pages 101-110
Automating the Machine Learning Process with H2O
- Tshepo Chris Nokeri
Pages 111-116
Back Matter

Pages 117-119

Download chapter PDF

Keywords

About this book

Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process.

The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras.

The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked.

This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics.

What You Will Learn

Understand widespread supervised and unsupervised learning, including key dimension reduction techniques
Know the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learning
Integrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworks
Design, build, test, and validate skilled machine models and deep learning models
Optimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration

Who This Book Is For

Data scientists and machine learning engineers with basic knowledge and understanding of Python programming, probability theories, and predictive analytics

Reviews

“The book has a reader-centric style. Topics are covered briefly. … The book can be considered as an introduction to various topics. Code listings and graphical results for different models are added benefits, which could enhance learning and exposure.” (Jawwad Shamsi, Computing Reviews, June 29, 2022)

Authors and Affiliations

Pretoria, South Africa

Tshepo Chris Nokeri

About the author

Tshepo Chris Nokeri harnesses advanced analytics and artificial intelligence to foster innovation and optimize business performance. In his functional work, he has delivered complex solutions to companies in the mining, petroleum, and manufacturing industries. He initially completed a bachelor’s degree in information management. Afterward, he graduated with an Honours degree in business science at the University of the Witwatersrand on a TATA Prestigious Scholarship and a Wits Postgraduate Merit Award. They unanimously awarded him the Oxford University Press Prize.

Bibliographic Information

Book Title: Data Science Solutions with Python
Book Subtitle: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
Authors: Tshepo Chris Nokeri
DOI: https://doi.org/10.1007/978-1-4842-7762-1
Publisher: Apress Berkeley, CA
eBook Packages: Professional and Applied Computing, Apress Access Books, Professional and Applied Computing (R0)
Softcover ISBN: 978-1-4842-7761-4Published: 26 October 2021
eBook ISBN: 978-1-4842-7762-1Published: 25 October 2021
Edition Number: 1
Number of Pages: XVI, 119
Number of Illustrations: 35 b/w illustrations
Topics: Statistics, general, Machine Learning, Artificial Intelligence, Python

Publish with us

Policies and ethics

Data Science Solutions with Python

Overview

Access this book

Other ways to access

Table of contents (10 chapters)

Front Matter

Back Matter