TTNWW - TST Tools for the Dutch Language as Web services in a Workflow
TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components (for text and speech) are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes. The web services are available in two separate domains: "Text" and "Speech" processing. For "Text", workflows for the following functionality is offered by TTNWW: - Orthographic Normalisation using TICCLops (version CLARIN-NL 1.0); - Part of Speech Tagging, Lemmatisation, Chunking, limited Multiword Unit Recognition, and Grammatical Relation Assignment by Frog (Version 012.012); - Syntactic Parsing (including grammatical relation assignment, limited named entity recognition, and limited multiword unit recognition) by the Alpino Parser (version 1.3); - Semantic Annotation; - Named Entity Recognition; - Co-reference Assignment. For "Speech", the following workflows are offered: - Automatic Transcription of speech files using a Netherlands Dutch acoustic model; - Automatic Transcription of speech files using a Flemish Dutch acoustic model; - Conversion of the input speech file to the required sampling rate, followed by automatic transcription. The TTNWW services have been created in a Dutch and Flemish collaboration project building on the results of past Dutch and Flemish projects. The web services are partly deployed in the SURF-SARA BiG-Grid cloud or at CLARIN centres in the Netherlands and at CLARIN VL University partners. The architecture of the TTNWW portal consists out of several components and follows the principles of Service Oriented Architecture (SOA). The TTNWW GUI front-end is a Flex module that communicates with the TTNWW web-application which keeps track of the different sessions and knows which LT recipes are available. TTNWW communicates assigments (workflow specifications) to the WorkflowService that evaluates the requested workflow and requests the DeploymentSevice to start the required LT web-services. After initialization of the LT web-services, the workflow specification is sent to the Taverna Server, that takes further care of the workflow. To facilitate the process of wrapping applications that were originally designed as standalone applications into web services, the CLAM (Computational Linguistics Application Mediator) wrapper software allows for easy and transparent transformation of applications into RESTful web services. The CLAM software has extensively been used in the TTNWW project for both text and speech processing tools. With the exception of Alpino and MBSRL all web services work operate on CLAM wrappers. Given the number of web services involved in the TTNWW project and possibilities offered by the cloud environment the preferred method of delivering the web service installations was delivery of complete virtual machine images by the LT providers. These could be directly uploaded into the cloud environment and thus relieving the CLARIN centres nd LT providers from the original foreseen task of running the webservices themselves. A potential advantage of this method, that has not been exploited in the project yet, is that these images may be also be delivered directly to the end user so these can be run in a local configuration using virtualization software such as VMWare of VirtualBox. The workflow engine used in the project was Taverna. But build on top of this was a a number of selectable task recipes, following a task oriented approach in line with the premises that users with no or little technical expertise should be able to use the system. In this context, tasks are understood in terms of end results of processes such as semantic role labelling, pos tagging or syntactic analysis and ready-made workflows are constructed that can be readily used by the end user.
CLARIN National Project
Kemps-Snijders, M, Schuurman, I, Daelemans, W, Demuynck, K, Desplanques, B, Hoste, V, Huijbregts, M, Martens, J-P, Paulussen, H, Pelemans, J, Reynaert, M, Vandeghinste, V, van den Bosch, A, van denHeuvel, H, van Gompel, M, van Noord, G and Wambacq, P. 2017. TTNWW to the Rescue: No Need to Know How to Handle Tools and Resources. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 83–93. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.7. License: CC-BY 4.0