Media data


LAISEANG: Language Archive of Insular South East Asia and West New Guinea


The LAISEANG corpus contains an unrivaled collection of multimedia materials and written documents from over 50 languages in Insular South East Asia and West New Guinea.


VU-DNC: VU Diachronic Newspaper Corpus


VU-DNC is a unique diachronic corpus of Dutch newspaper articles from five major Dutch newspapers from 1950/1951 and 2002 (2 MW). The VU-DNC has been annotated for quotations, which enables the researcher to differentiate between the words directly under responsibility of the journalist.


VALID - vulnerability in language acquisition: language impairments in Dutch


An open access multimedia archive of language pathology data collected in the Netherlands, primarily on Dutch, audio files and transcripts. Currently, this corpus contains 5 different data sets. In the VALID data archive old, current and future data can be brought together.


IPNV - Interview Project Dutch Veterans


The IPNV data set contains the public part of a collection of interviews collected by the Dutch Veteran Institute. They contain stories covering almost all conflicts and military missions where the Netherlands were involved. The public part of this collection of about 500 interviews was made available via the internet the non-public part. For CLARIN the data was curated by the DCS ( in May 2013.


DBD - Dutch Bilingualism Database, TCULT - Talen en Culturen in Utrechtse Lombok en Transvaal


The CLARIN NL supported data sets are part of an already existing collection: Dutch Bilingualism Database housed at the MPI for Psycholinguistics that are both also CLARIN compatible. The addtional DBD / TCULT data were curated by the CLARIN DCS ( and delivered in February 2014.