Skip to main content
  • Book
  • © 2017

Big Data Factories

Collaborative Approaches

  • Provides basic researchers and practitioners direct guidelines and best case scenarios for developing activities related to data factoring
  • Presents methods for teaching data factoring
  • Proposes a set of principles for developing data factoring

Part of the book series: Computational Social Sciences (CSS)

Buy it now

Buying options

eBook USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 39.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (9 chapters)

  1. Front Matter

    Pages i-vi
  2. Introduction

    • Nicolas Jullien, Sorin Adam Matei, Sean P. Goggins
    Pages 1-6
  3. Theoretical Principles and Approaches to Data Factories

    1. Front Matter

      Pages 7-7
    2. The Open Community Data Exchange: Advancing Data Sharing and Discovery in Open Online Community Science

      • Sean P. Goggins, A. J. Million, Georg J. P. Link, Matt Germonprez, Kristen Schuster
      Pages 23-35
  4. Theoretical Principles and Ideas for Designing and Deploying Data Factory Approaches

    1. Front Matter

      Pages 37-37
  5. Approaches in Action Through Case Studies of Data Based Research, Best Practice Scenarios, or Educational Briefs

    1. Front Matter

      Pages 77-77
    2. Lessons Learned from a Decade of FLOSS Data Collection

      • Kevin Crowston, Megan Squire
      Pages 79-100
    3. Teaching Students How (Not) to Lie, Manipulate, and Mislead with Information Visualization

      • Athir Mahmud, MĂ©l Hogan, Andrea Zeffiro, Libby Hemphill
      Pages 101-114
    4. Democratizing Data Science: The Community Data Science Workshops and Classes

      • Benjamin Mako Hill, Dharma Dailey, Richard T. Guy, Ben Lewis, Mika Matsuzaki, Jonathan T. Morgan
      Pages 115-135Open Access
  6. Back Matter

    Pages 137-141

About this book

The book proposes a systematic approach to big data collection, documentation and development of analytic procedures that foster collaboration on a large scale. This approach, designated as “data factoring” emphasizes the need to think of each individual dataset developed by an individual project as part of a broader data ecosystem, easily accessible and exploitable by parties not directly involved with data collection and documentation. Furthermore, data factoring uses and encourages pre-analytic operations that add value to big data sets, especially recombining and repurposing.


The book proposes a research-development agenda that can undergird an ideal data factory approach. Several programmatic chapters discuss specialized issues involved in data factoring (documentation, meta-data specification, building flexible, yet comprehensive data ontologies, usability issues involved in collaborative tools, etc.). The book also presents case studies for data factoring and processing that can lead to building better scientific collaboration and data sharing strategies and tools.
Finally, the book presents the teaching utility of data factoring and the ethical and privacy concerns related to it.


Chapter 9 of this book is available open access under a CC BY 4.0 license at link.springer.com

Editors and Affiliations

  • Purdue University, West Lafayette, USA

    Sorin Adam Matei

  • TechnopĂ´le Brest-Iroise, IMT Atlantique (Telecom Bretagne), Brest Cedex 3, France

    Nicolas Jullien

  • Computer Science, University of Missouri, Columbia, USA

    Sean P. Goggins

About the editors

Sorin Matei is a Professor at Brian Lamb School of Communication at Purdue University.  His focus areas are computational social science, collaborative content production, and data storytelling.

Nicolas Jullien is an Associate Professor at the LUSSI Department of Telecom Bretagne.  His research interests are in open and online communities.


Sean Patrick Goggins is an Associate Professor at Missouri's iSchool, with courtesy appointments as core faculty in the University of Missouri's Informatics Institute and Department of Computer Science.





Bibliographic Information

Buy it now

Buying options

eBook USD 29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 37.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 39.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access