Corpora building and maintenance
Coordinator: Pier Marco Bertinetto
Staff involved in the project: Chiara Bertini, Chiara Celata, Luca Ciucci, Irene Ricci, Luigi Talamo
Valentina Bambini (IUSS, Pavia)
This should be regarded as a permanent LABLIN task, rather than a research area in the proper sense. Along the years, the lab has taken part in a number of enterprises aiming at producing durable archives. To the extent that such archives reside (exclusively or sharewise) on our web-site, we feel obliged to ensure their maintenance and up-grading.
This is no longer the case for the vocal archive AVIP (“Archivio delle Varietà di Italiano Parlato”) – produced by a consortium led by P.M. Bertinetto – which has subsequently migrated into the follow-up project API (Archivio del Parlato Italiano), presently accessible and down-loadable from Università Federico II di Napoli.
Among the most important corpora accessible from the lab’s website, CoLFIS (Corpus e Lessico di Frequenza dell’Italiano Scritto) is the result of a collective enterprise, involving scholars from Roma and Genova. This corpus contains over 3 millions carefully lemmatized words and is the standard tool for assessing the frequency values of Italian words. LABLIN is presently engaged in the attempt to expand the potentialities of this research tool (see the Wikimemo special project for more details).
LABLIN is also engaged in the task of digitizing the Calambrone portion of the CHILDES archive, documenting the acquisition of L1-Italian. The archive consists of a fairly large stock of videocassettes recorded during the Eighties and owned by the IRCCS Fondazione Stella Maris (Calambrone, Pisa). In addition to these on-going activities, LABLIN maintains the Corpus di lapsus della Scuola Normale Superiore (collected at the end of the Eighties), the only publicly available Italian speech error corpus.
A growing archive of audio recordings is also available.
In Autumn 2010, the lab got financial support from the PAR FAS 2007-13, administered by Regione Toscana, for the special project GRA.FO. The aim was to spot, digitize and safeguard all sorts of vocal documents of linguistic and ethnographic interest recorded in the Tuscany territory, in order to document the diversity of its dialects.
Gli archivi sonori: per un dialogo interdisciplinare 2011
Towards a synergistic European initiative for speech-recording long-term preservation 2011
last up-dating: January 2015