Skip to main content
  • Textbook
  • © 2020

Text Analysis with R

For Students of Literature

  • Guides students and scholars with no programming experience who wish to learn R for text analysis
  • Integrates two new chapters that introduce dplyr, tidyr, and the syuzhet package
  • Flows from simple single text analysis to corpora level analysis, so that readers gain an immediate and fundamental understanding of computational text analysis
  • Includes supplementary material: sn.pub/extras

Buy it now

Buying options

eBook USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (18 chapters)

  1. Front Matter

    Pages i-xxiii
  2. Microanalysis

    1. Front Matter

      Pages 1-1
    2. R Basics

      • Matthew L. Jockers, Rosamond Thalken
      Pages 3-13
    3. First Foray into Text Analysis with R

      • Matthew L. Jockers, Rosamond Thalken
      Pages 15-30
    4. Accessing and Comparing Word Frequency Data

      • Matthew L. Jockers, Rosamond Thalken
      Pages 31-36
    5. Token Distribution and Regular Expressions

      • Matthew L. Jockers, Rosamond Thalken
      Pages 37-47
    6. Token Distribution Analysis

      • Matthew L. Jockers, Rosamond Thalken
      Pages 49-67
    7. Correlation

      • Matthew L. Jockers, Rosamond Thalken
      Pages 69-79
    8. Measures of Lexical Variety

      • Matthew L. Jockers, Rosamond Thalken
      Pages 81-91
    9. Hapax Richness

      • Matthew L. Jockers, Rosamond Thalken
      Pages 93-97
    10. Do It KWIC

      • Matthew L. Jockers, Rosamond Thalken
      Pages 99-108
    11. Do It KWIC(er) (and Better)

      • Matthew L. Jockers, Rosamond Thalken
      Pages 109-118
  3. Metadata

    1. Front Matter

      Pages 119-119
    2. Introduction to dplyr

      • Matthew L. Jockers, Rosamond Thalken
      Pages 121-132
    3. Parsing TEI XML

      • Matthew L. Jockers, Rosamond Thalken
      Pages 133-144
    4. Parsing and Analyzing Hamlet

      • Matthew L. Jockers, Rosamond Thalken
      Pages 145-157
    5. Sentiment Analysis

      • Matthew L. Jockers, Rosamond Thalken
      Pages 159-174
  4. Macroanalysis

    1. Front Matter

      Pages 175-175
    2. Clustering

      • Matthew L. Jockers, Rosamond Thalken
      Pages 177-194
    3. Classification

      • Matthew L. Jockers, Rosamond Thalken
      Pages 195-210

About this book

Now in its second edition, Text Analysis with R provides a practical introduction to computational text analysis using the open source programming language R. R is an extremely popular programming language, used throughout the sciences; due to its accessibility, R is now used increasingly in other research areas. In this volume, readers immediately begin working with text, and each chapter examines a new technique or process, allowing readers to obtain a broad exposure to core R procedures and a fundamental understanding of the possibilities of computational text analysis at both the micro and the macro scale. Each chapter builds on its predecessor as readers move from small scale “microanalysis” of single texts to large scale “macroanalysis” of text corpora, and each concludes with a set of practice exercises that reinforce and expand upon the chapter lessons. The book’s focus is on making the technical palatable and making the technical useful and immediately gratifying.

Text Analysis with R is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological toolkit to include quantitative and computational approaches to the study of text. Computation provides access to information in text that readers simply cannot gather using traditional qualitative methods of close reading and human synthesis. This new edition features two new chapters: one that introduces dplyr and tidyr in the context of parsing and analyzing dramatic texts to extract speaker and receiver data, and one on sentiment analysis using the syuzhet package. It is also filled with updated material in every chapter to integrate new developments in the field, current practices in R style, and the use of more efficient algorithms.

Reviews

On the First Edition:

"I can't think of a more qualified person to guide readers through powerful R techniques for text analysis. While extremely useful for people studying literature, these techniques can be also used by anybody working with texts. Even if you simply want to understand how companies and data scientists are analyzing all kinds of texts, go through this book." (Lev Manovich, Department of Computer Science, The Graduate Center, City University of New York & author of The Language of New Media)

"The open source programming language R has become one of the most central statistical and analytical tool in many sciences. While it has already been used in linguistic applications, this book is the first to discuss the application of (corpus-linguistic and other) methods with R in the context of literary studies. The author covers a wide range of descriptive, analytical, and exploratory methods beautifully and in detail in a book that will appeal to a wide anddiverse audience of both students and seasoned researchers from literary studies, linguistic computing, and the digital humanities more generally." (Stefan Th. Gries, Department of Linguistics, University of California, Santa Barbara & author of Quantitative corpus linguistics with R: A Practical Introduction)

"This book does a great service for literary scholars interested in computational approaches to text analysis, giving them ready access to powerful methods for exploring patterns and relationships across large quantities of text. Its clear and lucid explanations will also make it an easy textbook to teach from, especially for instructors with prior background who can then use it as a stepping stone to introducing more complex methods. Amateurs and those with little programming background will find it imminently accessible." (Hoyt Long, Department of East Asian Languages and Civilizations, University of Chicago)

"Through my work as an epidemiologist, I encounter electronic health records in an unstructured form (i.e. text), and Text Analysis with R covers many of the initial steps for studying these records. The book is very accessible; it provides a straightforward introduction to manipulating text information without presuming a background in programming or a familiarity with the jargon used in this field. I also appreciated Jockers' thoughtful inclusion of supplemental explanations and information in footnotes throughout the book. For example, text analysis often involves the use of "regular expressions"; a footnote concisely explains wildcard and escape characters and this explanation spared me a fair bit of confusion in my own work. Although I am not a "student of literature", I thought the book contained many generalizable and expertly-taught lessons that make it a valuable introduction to manipulating and analyzing text." (Matthew Maenner, Ph.D.)

"This book is a worthy introduction to computational text analysis, and it fills animportant gap in the literature. It’s very accessible and contains plenty of interesting examples and real applications, which have been collected and crafted over the many years the author taught text analysis to undergraduate and graduate students. Although it focuses on the study of literature, I would highly recommend this book to students in business administration and related fields." (Joao Quariguasi Frota Neto, School of Management, University of Bath)

Authors and Affiliations

  • College of Arts and Sciences, Washington State University, Pullman, USA

    Matthew L. Jockers

  • Digital Technology and Culture Program, Washington State University, Pullman, USA

    Rosamond Thalken

About the authors

Matthew L. Jockers is Professor of English and Data Analytics as well as Dean of the College of Arts and Sciences at Washington State University. He leverages computers and statistical learning methods to extract information from large collections of books. Using tools and techniques from linguistics, natural language processing, and machine learning, Jockers crunches the numbers (and the words) looking for patterns and connections. This computational approach to the study of literature facilitates a type of literary “macroanalysis” or “distant reading” that goes beyond what a traditional literary scholar could hope to study. Dr. Jockers’s most recent book, The Bestseller Code (2016, with Jodie Archer), has earned critical praise, and the algorithms at the heart of its research won the University of Nebraska’s Breakthrough Innovation of the Year in 2018. In addition to his academic research, Jockers has worked in industry, first as Director of Research at a data-driven book industry startup company and then as Principal Research Scientist and Software Development Engineer in iBooks at Apple, Inc. In 2017, he and Jodie Archer founded “Archer Jockers, LLC,” a text mining and consulting company that helps authors develop more successful novels through data analytics. In late 2019, Jockers and others founded a new text mining startup focused on helping independent authors (“indies”).

Rosamond Thalken is an Instructor of English and Digital Technology and Culture at Washington State University. Her research engages questions about the intersections and impacts among digital technology, language, and gender. She currently teaches College Composition and Digital Diversity, a course which analyzes the cultural contexts within digital spaces, including intersections of race, gender, class, and sexuality. In 2019, Thalken finished her Master’s degree in English Literature at Washington State University. Her thesis combined text analysis and close reading to explore the female Supreme Court Justices’ rhetorical strategies for reinforcing ethos in court opinions.

Bibliographic Information

Buy it now

Buying options

eBook USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access