Communication & Media Studies




The OpenConvert tools convert to TEI or FOLiA from a number of input formats (alto, text, word, HTML, ePub). The tools are available as a Java command line tool, a web service and a web application.


VU-DNC: VU Diachronic Newspaper Corpus


VU-DNC is a unique diachronic corpus of Dutch newspaper articles from five major Dutch newspapers from 1950/1951 and 2002 (2 MW). The VU-DNC has been annotated for quotations, which enables the researcher to differentiate between the words directly under responsibility of the journalist.


VALID - vulnerability in language acquisition: language impairments in Dutch


An open access multimedia archive of language pathology data collected in the Netherlands, primarily on Dutch, audio files and transcripts. Currently, this corpus contains 5 different data sets. In the VALID data archive old, current and future data can be brought together.


TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes. The web services are available in two separate domains: "Text" and "Speech" processing. The TTNWW services have been created in a Dutch and Flemish collaboration project building on the results of past Dutch and Flemish projects. The web services are partly deployed in the SURF-SARA BiG-Grid cloud or at CLARIN centres in the Netherlands and at CLARIN VL University partners.


WAHSP/BILAND is a research tool for historians that uses textual data of news media from the period 1863-1940 of the Koninklijke Bibliotheek and Staatsbibliothek zu Berlin as input material. One can search with single query terms or with combinations thereof. Apart from showing the articles that match the query, the results can be visualized by word clouds of single articles together with sentiment words highlighted, or by a word cloud of the whole result set together with newspaper statistics derived from their metadata. The WAHSP and BILAND applications have been succeeded by the TexCavator application. Links below are to TexCavator.


An advanced search engine for the OCR-ed scanned image collection of proceedings of the Dutch Hansard (Handelingen der Staten-Generaal 1930-1995). These proceedings are available as a fully annotated semi-structures dataset for historical and social science research. The output of the search engine can be restricted by speaker name, party, date range, and other criteria.