INPOLDER

INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect

Summary

INPOLDER (Integrated Parser and Lemmatizer of Dutch in Retrospect) provides a tool that assigns morphological tagging, lemmatization, and syntactic parsing for historical Dutch texts. It is built on the Adelheid tool (tagging and lemmatization) and Collins-Bikel statistical Parser.

Background

As an essential part of the Dutch cultural heritage, it is of vital importance that the Dutch historical record be made accessible for research into a wide range of historical and linguistic research questions. In the transition from the Middle Ages to the modern era, the Netherlands developed from speaking a diverse group of dialects (Hollandic, Brabantic, Flemish, North-eastern, Limburgian) to a country with a standard language, and there is good reason to believe that this process was an extremely dynamic one. Systematic research into these processes affecting syntax, phonology, morphology and spelling cannot be done without access to lemmatized, tagged and parsed corpora of historical Dutch. In recent years, a tagger-lemmatizer has been developed by Hans van Halteren (Adelheid, also available in the CLARIN infrastructure). INPOLDER complements these enrichment tool with a parser for historical Dutch.

The INPOLDER parser is trained using a subset of the corpus of fourteenth-century texts (Corpus van Reenen/Mulder CRM, van Reenen and Mulder, 1993; Rem, 2003) and a subset of the Drenthe corpus (DC). CRM consists of 2700 charters from 345 places of origin. The corpus was designed as representative for the local language use of Middle Dutch and to be suitable for all types of linguistic research.

Contacts
  • Project leader: 
Prof. Dr. Ans van Kemenade (Radboud University)
  • CLARIN center: Meertens Institute
  • Help contact
: gertjan.postmaATmeertens.knaw.nl (linguistic issues), marc.kemps.snijdersATmeertens.knaw.nl (tech issues)
Links

Research domain

Language

Resource tags

Tool task

CLARIN centre

Country

Netherlands