Skip to main content
  • Book
  • © 2021

Provenance in Data Science

From Data Models to Context-Aware Knowledge Graphs

  • Presents a collection of provenance techniques and state-of-the-art metadata-enhanced, provenance-aware, knowledge graph-based representations to be used for information processing, management, aggregation, fusion, and visualization
  • Illustrates how to use context-aware knowledge graphs in a variety of domains, from cybersecurity to biomedicine
  • With the emergence of data science, several semantic web standards are discussed in this book

Part of the book series: Advanced Information and Knowledge Processing (AI&KP)

Buy it now

Buying options

eBook USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (6 chapters)

  1. Front Matter

    Pages i-xi
  2. Data Provenance and Accountability on the Web

    • Oshani W. Seneviratne
    Pages 11-24
  3. The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation

    • Adriane Chapman, Abhirami Sasikant, Giulia Simonelli, Paolo Missier, Riccardo Torlone
    Pages 25-45
  4. ProvCaRe: A Large-Scale Semantic Provenance Resource for Scientific Reproducibility

    • Chang Liu, Matthew Kim, Michael Rueschman, Satya S. Sahoo
    Pages 59-73
  5. Graph-Based Natural Language Processing for the Pharmaceutical Industry

    • Alexandra Dumitriu, Cliona Molony, Chathuri Daluwatte
    Pages 75-110

About this book

RDF-based knowledge graphs require additional formalisms to be fully context-aware, which is presented in this book. This book also provides a collection of provenance techniques and state-of-the-art metadata-enhanced, provenance-aware, knowledge graph-based representations across multiple application domains, in order to demonstrate how to combine graph-based data models and provenance representations.  This is important to make statements authoritative, verifiable, and reproducible, such as in biomedical, pharmaceutical, and cybersecurity applications, where the data source and generator can be just as important as the data itself.
             
Capturing provenance is critical to ensure sound experimental results and rigorously designed research studies for patient and drug safety, pathology reports, and medical evidence generation. Similarly, provenance is needed for cyberthreat intelligence dashboards and attack mapsthat aggregate and/or fuse heterogeneous data from disparate data sources to differentiate between unimportant online events and dangerous cyberattacks, which is demonstrated in this book. Without provenance, data reliability and trustworthiness might be limited, causing data reuse, trust, reproducibility and accountability issues.


This book primarily targets researchers who utilize knowledge graphs in their methods and approaches (this includes researchers from a variety of domains, such as cybersecurity, eHealth, data science, Semantic Web, etc.). This book collects core facts for the state of the art in provenance approaches and techniques, complemented by a critical review of existing approaches. New research directions are also provided that combine data science and knowledge graphs, for an increasingly important research topic.

Editors and Affiliations

  • Edith Cowan University, Perth, Australia

    Leslie F. Sikos

  • Health Data Research, Rensselaer Polytechnic Institute, Troy, USA

    Oshani W. Seneviratne

  • Rensselaer Polytechnic Institute, Troy, USA

    Deborah L. McGuinness

About the editors

Dr. Leslie F. Sikos is a computer scientist specializing in artificial intelligence and data science, with a focus on cybersecurity applications. He holds two Ph.D. degrees and 20+ industry certificates. He is an active member of the research community as an author, editor, reviewer, conference organizer, and speaker, and a member of industry-leading organizations, such as the ACM and the IEEE. He contributed to international standards and developed state-of-the-art AI systems. Dr. Sikos published more than 20 books, including textbooks, monographs, and edited volumes.

Dr. Oshani W. Seneviratne is the Director of Health Data Research at the Institute for Data Exploration and Applications at the Rensselaer Polytechnic Institute (Rensselaer IDEA). She obtained her Ph.D. in Computer Science from Massachusetts Institute of Technology in 2014 under the supervision of Sir Tim Berners-Lee, the inventor of the World Wide Web. During her Ph.D., Oshani researched Accountable Systems for the Web. She invented a novel web protocol called HTTPA (HyperText Transfer Protocol with Accountability), and a novel provenance tracking mechanism called the Provenance Tracking Network. This work was demonstrated to be effective in several domains including electronic health care records transfer, and intellectual property protection in Web-based decentralized systems. At Rensselaer IDEA, Oshani leads the Smart Contracts Augmented with Analytics Learning and Semantics (SCALeS) project. The goal of this project is to predict, detect, and fix initially unforeseen situations in smart contracts utilizing novel combinations of machine learning, program analysis, and semantic technologies. Oshani is also involved in the Health Empowerment by Analytics, Learning, and Semantics (HEALS) Project. In HEALS she oversees the research operations targeted at the characterization and analysis of computational medical guidelines for chronic diseases such as diabetes, and the modeling of guideline provenance. Before Rensselaer, Oshani worked at Oracle specializing in distributed systems, provenance and healthcare-related research. She is the co-inventor of two enterprise provenance patents.


Prof. Deborah L. McGuinness is the Tetherless World Senior Constellation Chair and Professor of Computer, Cognitive, and Web Sciences at RPI. She is also the founding director of the Web Science Research Center and the CEO of McGuinness Associates Consulting. Deborah has been recognized with awards as a fellow of the American Association for the Advancement of Science (AAAS) for contributions to the Semantic Web, knowledge representation, and reasoning environments and as the recipient of the Robert Engelmore Award from the Association for the Advancement of Artificial Intelligence (AAAI) for leadership in Semantic Web research and in bridging Artificial Intelligence (AI) and eScience, significant contributions to deployed AI applications, and extensive service to the AI community. Deborah leads a number of large diverse data intensive resource efforts and her team is creating next-generation ontology-enabled research infrastructure for work in large interdisciplinary settings. Prior to joining RPI, Deborah was the acting director of the Knowledge Systems, Artificial Intelligence Laboratory and Senior Research Scientist in the Computer Science Department of Stanford University, and previous to that she was at AT\&T Bell Laboratories. Deborah consults with numerous large corporations as well as emerging startup companies wishing to plan, develop, deploy, and maintain semantic web and/or AI applications. Some areas of recent work include data science, next generation health advisors, ontology design and evolution environments, semantically enabled virtual observatories, semantic integration of scientific data, context-aware mobile applications, search, eCommerce, configuration, and supply chain management. Deborah holds a Bachelor of Math and Computer Science from Duke University, a Master of Computer Science from University of California at Berkeley, and a Ph.D. in Computer Science from Rutgers University.


Bibliographic Information

Buy it now

Buying options

eBook USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access