19. Mining the Past – Data-Intensive Knowledge Discovery in the Study of Historical Textual Traditions
Kristoffer Laigaard Nielbo [+]
Interacting Minds Centre, Aarhus University
Ryan Nichols [+]
California State University, Fullerton
Edward Slingerland [+]
University of British Columbia
Text-heavy and unstructured data constitute the primary source materials for many historical reconstructions. In history and the history of religion, text analysis has typically been conducted by systematically selecting a small sample of texts and subjecting it to highly detailed reading and mental synthesis. But two interrelated technological developments have rendered a new data-intensive paradigm – one that can usefully supplement qualitative analysis – possible in the study of historical textual traditions. First, the availability of significant computing power has made it possible to run algorithms for automated text analysis on most personal computers. Second, the rapid increase in full text digital databases relevant to the study of religion has considerably reduced costs related to data acquisition and digitization. However, a limited understanding of the scope, advantages, and limitations of data-intensive methods have created real obstacles to the implementation of this paradigm in historical research. This is unfortunate, because history offers a rich and uncharted field for data-intensive knowledge discovery, and historians already have the much sought after and necessary domain expertise. In this article we seek to remove obstacles to the data intensive paradigm by presenting its methods and models for handling text-heavy data.