Name: Scalable Big Data Analytics for Protein Bioinformatics
ISBN: 978-3-319-98839-9

Overview

Authors:

Dariusz Mrozek ORCID: http://orcid.org/0000-0001-6764-6656⁰

Dariusz Mrozek
1. Silesian University of Technology, Gliwice, Poland
View author publications

You can also search for this author in PubMed Google Scholar

Highlights the potential held by new computational techniques, such as cloud computing and big data technologies, in connection with protein bioinformatics
Chiefly focuses on protein structure, which remains poorly understood and is not effectively used in medicine
Describes methods for applying structural bioinformatics in medical diagnostics

Part of the book series: Computational Biology (COBO, volume 28)

9049 Accesses
18 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 84.99

Price excludes VAT (USA)

Softcover Book USD 109.99

Price excludes VAT (USA)

Hardcover Book USD 109.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (11 chapters)

Front Matter

Pages i-xxvi

Download chapter PDF
Background
1. Front Matter
  
  Pages 1-1
  
  Download chapter PDF
2. Formal Model of 3D Protein Structures for Functional Genomics, Comparative Bioinformatics, and Molecular Modeling
  
  Dariusz Mrozek
  
  Pages 3-27
3. Technological Roadmap
  
  Dariusz Mrozek
  
  Pages 29-48
Cloud Services for Scalable Computations
1. Front Matter
  
  Pages 49-49
  
  Download chapter PDF
2. Azure Cloud Services
  
  Dariusz Mrozek
  
  Pages 51-67
3. Scaling 3D Protein Structure Similarity Searching with Azure Cloud Services
  
  Dariusz Mrozek
  
  Pages 69-102
4. Cloud Services for Efficient Ab Initio Predictions of 3D Protein Structures
  
  Dariusz Mrozek
  
  Pages 103-134
Big Data Analytics in Protein Bioinformatics
1. Front Matter
  
  Pages 135-135
  
  Download chapter PDF
2. Foundations of the Hadoop Ecosystem
  
  Dariusz Mrozek
  
  Pages 137-150
3. Hadoop and the MapReduce Processing Model in Massive Structural Alignments Supporting Protein Function Identification
  
  Dariusz Mrozek
  
  Pages 151-182
4. Scaling 3D Protein Structure Similarity Searching on Large Hadoop Clusters Located in a Public Cloud
  
  Dariusz Mrozek
  
  Pages 183-214
5. Scalable Prediction of Intrinsically Disordered Protein Regions with Spark Clusters on Microsoft Azure Cloud
  
  Dariusz Mrozek
  
  Pages 215-247
Multi-threaded Solutions for Protein Bioinformatics
1. Front Matter
  
  Pages 249-249
  
  Download chapter PDF
2. Massively Parallel Searching of 3D Protein Structure Similarities on CUDA-Enabled GPU Devices
  
  Dariusz Mrozek
  
  Pages 251-282
3. Exploration of Protein Secondary Structures in Relational Databases with Multi-threaded PSS-SQL
  
  Dariusz Mrozek
  
  Pages 283-309
Back Matter

Pages 311-315

Download chapter PDF

Keywords

About this book

This book presents a focus on proteins and their structures. The text describes various scalable solutions for protein structure similarity searching, carried out at main representation levels and for prediction of 3D structures of proteins. Emphasis is placed on techniques that can be used to accelerate similarity searches and protein structure modeling processes.

The content of the book is divided into four parts. The first part provides background information on proteins and their representation levels, including a formal model of a 3D protein structure used in computational processes, and a brief overview of the technologies used in the solutions presented in the book. The second part of the book discusses Cloud services that are utilized in the development of scalable and reliable cloud applications for 3D protein structure similarity searching and protein structure prediction. The third part of the book shows the utilization of scalable Big Datacomputational frameworks, like Hadoop and Spark, in massive 3D protein structure alignments and identification of intrinsically disordered regions in protein structures. The fourth part of the book focuses on finding 3D protein structure similarities, accelerated with the use of GPUs and the use of multithreading and relational databases for efficient approximate searching on protein secondary structures.

The book introduces advanced techniques and computational architectures that benefit from recent achievements in the field of computing and parallelism. Recent developments in computer science have allowed algorithms previously considered too time-consuming to now be efficiently used for applications in bioinformatics and the life sciences. Given its depth of coverage, the book will be of interest to researchers and software developers working in the fields of structural bioinformatics and biomedical databases.

Reviews

“In this book, the author deals with various techniques that can be used for data handling and efficient analysis related to computational processes that require a great deal of time and effort, for example, structure similarity searching, protein structure modeling, protein structure alignment, and superposition.” (Jasbir Kaur, zbMath 1411.92002, 2019)

“This excellent and practically oriented text can benefit researchers seeking to establish a cloud-based bioinformatics HPC facility. Note that most of the solutions are implemented as embarrassingly parallel processes and not as distributed parallel processes. The book will be of interest to researchers and scientific software developers of bioinformatics and biomedical databases.” (Alexander Tzanov, Computing Reviews, June 06, 2019)

Authors and Affiliations

Silesian University of Technology, Gliwice, Poland

Dariusz Mrozek

About the author

Dariusz Mrozek is currently an Associate Professor and Head of Division of Theory of Informatics in Institute of Informatics at the Silesian University of Technology (SUT) in Gliwice, Poland. He received his PhD degree from SUT in 2006. His research interests cover bioinformatics, information systems, parallel and Cloud computing, databases and Big data. He is now focused on the analysis of protein structures, functions and activities, and the use of novel computation techniques to get insights from biological data, including NGS and proteomics data. He is the author of 90+ papers published in conference proceedings and international journals, co-editor of thirteen books devoted to databases and data processing, and editor of two special issues in reputable scientific journals. He is a member of the IEEE Engineering in Medicine and Biology Society (EMBS), IEEE Systems, Man, and Cybernetics Society (SMCS), and IEEE Cloud Computing Community. Working in different research projects, he cooperated with qualified institutions, e.g. Imperial College of London (on the Chernobyl Tissue Bank), V P Komisarenko Institute of Endocrinology and Metabolism - Academy of Medical Sciences of the Ukraine, Medical Radiological Research Centre - Russian Academy of Medical Sciences, Helmholtz Zentrum Muenchen Deutsches Forschungszentrum Fuer Gesundheit und Umwelt Gmbh, Microsoft Research in the USA, Institute of Oncology in Gliwice, Poland, Medical University of Silesia, Katowice, Poland.

Bibliographic Information

Book Title: Scalable Big Data Analytics for Protein Bioinformatics
Book Subtitle: Efficient Computational Solutions for Protein Structures
Authors: Dariusz Mrozek
Series Title: Computational Biology
DOI: https://doi.org/10.1007/978-3-319-98839-9
Publisher: Springer Cham
eBook Packages: Computer Science, Computer Science (R0)
Copyright Information: Springer Nature Switzerland AG 2018
Hardcover ISBN: 978-3-319-98838-2Published: 09 October 2018
Softcover ISBN: 978-3-030-07538-5Published: 26 December 2018
eBook ISBN: 978-3-319-98839-9Published: 25 September 2018
Series ISSN: 1568-2684
Series E-ISSN: 2662-2432
Edition Number: 1
Number of Pages: XXVI, 315
Number of Illustrations: 41 b/w illustrations, 110 illustrations in colour
Topics: Computational Biology/Bioinformatics, Computer Communication Networks, Protein Structure, Bioinformatics, Mathematical and Computational Biology

Publish with us

Policies and ethics

Scalable Big Data Analytics for Protein Bioinformatics

Overview

Access this book

Other ways to access

Table of contents (11 chapters)

Front Matter

Background

Front Matter

Cloud Services for Scalable Computations

Front Matter

Big Data Analytics in Protein Bioinformatics

Front Matter

Multi-threaded Solutions for Protein Bioinformatics

Front Matter

Back Matter

Keywords

About this book

Reviews

Authors and Affiliations

Silesian University of Technology, Gliwice, Poland

About the author

Bibliographic Information

Publish with us

Search

Navigation