Name: SQL for Data Science
ISBN: 978-3-030-57592-2

Overview

Authors:

Antonio Badia ⁰

Antonio Badia
1. Computer Engineering & Computer Science, University of Louisville, Louisville, USA
View author publications

You can also search for this author in PubMed Google Scholar

Explains SQL within the context of data science and introduces its different parts as they are needed for data analysis
Focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing
Contains a lot of examples and exercises that can be played with using the open-source database systems MySQL and Postgres

Part of the book series: Data-Centric Systems and Applications (DCSA)

19k Accesses
6 Citations
8 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 44.99

Price excludes VAT (USA)

Softcover Book USD 59.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (6 chapters)

Front Matter

Pages i-xi

Download chapter PDF
The Data Life Cycle
- Antonio Badia
Pages 1-29
Relational Data
- Antonio Badia
Pages 31-76
Data Cleaning and Pre-processing
- Antonio Badia
Pages 77-169
Introduction to Data Analysis
- Antonio Badia
Pages 171-220
More SQL
- Antonio Badia
Pages 221-242
Databases and Other Tools
- Antonio Badia
Pages 243-259
Back Matter

Pages 261-285

Download chapter PDF

Keywords

About this book

This textbook explains SQL within the context of data science and introduces the different parts of SQL as they are needed for the tasks usually carried out during data analysis. Using the framework of the data life cycle, it focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing.

The book is organized as follows. Chapter 1 describes the data life cycle, i.e. the sequence of stages from data acquisition to archiving, that data goes through as it is prepared and then actually analyzed, together with the different activities that take place at each stage. Chapter 2 gets into databases proper, explaining how relational databases organize data. Non-traditional data, like XML and text, are also covered. Chapter 3 introduces SQL queries, but unlike traditional textbooks, queries and their parts are described around typical data analysis tasks like data exploration, cleaning and transformation.Chapter 4 introduces some basic techniques for data analysis and shows how SQL can be used for some simple analyses without too much complication. Chapter 5 introduces additional SQL constructs that are important in a variety of situations and thus completes the coverage of SQL queries. Lastly, chapter 6 briefly explains how to use SQL from within R and from within Python programs. It focuses on how these languages can interact with a database, and how what has been learned about SQL can be leveraged to make life easier when using R or Python. All chapters contain a lot of examples and exercises on the way, and readers are encouraged to install the two open-source database systems (MySQL and Postgres) that are used throughout the book in order to practice and work on the exercises, because simply reading the book is much less useful than actually using it.

This book is for anyone interested in data science and/or databases. It just demands a bit of computer fluency, butno specific background on databases or data analysis. All concepts are introduced intuitively and with a minimum of specialized jargon. After going through this book, readers should be able to profitably learn more about data mining, machine learning, and database management from more advanced textbooks and courses.

Authors and Affiliations

Computer Engineering & Computer Science, University of Louisville, Louisville, USA

Antonio Badia

About the author

Antonio Badia is Associate Professor in the Department of Computer Science and Engineering at the University of Louisville, KY, USA. He has taught both introductory and advanced college database courses for more than 20 years, and created and taught a course on data management and analysis for non-computer science students. His research on database systems has been funded by NSF and others, and produced more than 50 publications in conferences and technical journals.

Bibliographic Information

Book Title: SQL for Data Science
Book Subtitle: Data Cleaning, Wrangling and Analytics with Relational Databases
Authors: Antonio Badia
Series Title: Data-Centric Systems and Applications
DOI: https://doi.org/10.1007/978-3-030-57592-2
Publisher: Springer Cham
eBook Packages: Computer Science, Computer Science (R0)
Copyright Information: Springer Nature Switzerland AG 2020
Softcover ISBN: 978-3-030-57591-5Published: 10 November 2020
eBook ISBN: 978-3-030-57592-2Published: 09 November 2020
Series ISSN: 2197-9723
Series E-ISSN: 2197-974X
Edition Number: 1
Number of Pages: XI, 285
Number of Illustrations: 16 b/w illustrations
Topics: Database Management, Big Data/Analytics

Publish with us

Policies and ethics

SQL for Data Science

Overview

Access this book

Other ways to access

Table of contents (6 chapters)

Front Matter

The Data Life Cycle

Relational Data

Data Cleaning and Pre-processing

Introduction to Data Analysis

More SQL

Databases and Other Tools

Back Matter

Keywords

About this book

Authors and Affiliations

Computer Engineering & Computer Science, University of Louisville, Louisville, USA

About the author

Bibliographic Information

Publish with us

Search

Navigation