WAHSP/BILAND: web application for (bilingual) historical sentiment mining in public media
WAHSP/BILAND is a research tool for historians that uses textual data of news media from the period 1863-1940 of the Koninklijke Bibliotheek and Staatsbibliothek zu Berlin as input material. One can search with single query terms or with combinations thereof. Apart from showing the articles that match the query, the results can be visualized by word clouds of single articles together with sentiment words highlighted, or by a word cloud of the whole result set together with newspaper statistics derived from their metadata. WAHSP/BILAND enables historians to collect and process large bi-lingual (Dutch and German) sets of opinionated text-data from news media and extract discourse identity and intensity patterns in two different countries with different scripts (e.g. Latin and Gothic). This tool offers a unique opportunity for non-technical humanities researchers to perform a new kind of historical e-research for studying changing opinions, notions and perceptions regarding public health and policy issues. The text mining tools for opinion/sentiment extraction that form the technological base for WAHSP/BILAND have been developed within the NTU/STEVIN DuOMAn project. The technology includes algorithms and tools for identification of polarity (positive/support or negative/criticism), sources (opinion-holders), frequency of items and specific targets of discourses. The tools and subjectivity lexicons are implemented as modules of ‘Fietstas’ 2, an web service for text analysis. Fietstas also provides other essential text processing modules (morphological normalization, format and encoding reconciliation, named entity recognition and normalization, etc.) and visualization modules (interactive word clouds and timelines). Fietstas has been developed and is being used for processing of large-scale datasets in the context of several projects, such as DuOMAn. A text translation service based on Machine Learning can be used to translate existing lexicons and documents between Dutch and German (both directions). The web application uses this functionality of Fietstas to leverage interactive creation, expansion and refinement of lexicons specific to the user’s research questions and needs. For BILAND new bilingual and biscriptural lexicons have been developed. The application uses the visualization features of Fietstas to allow users to examine the research domain along the dimensions of time, context, and the identity and frequency of the discourse. WAHSP/BILAND is meant to be generic and testable in all domains, where analysis of topics, contexts and attitudes in large volumes of text is needed.
CLARIN National Project
Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0
CMDI File Link