Research in natural language processing for under-resources languages is currently an active area, in a global perspective of cultural heritage preservation. Regional languages generally fall into this category, as electronic resources for these languages are rare and sometimes non-existent. Providing electronic resources for these languages (including written corpora, lexicons and dictionaries) is a major asset for supporting their dissemination, teaching, preservation or standardization. It is, among others, necessary to develop written corpora, which are the most representative of language use, by collecting written works of various genres (literature, theater, poetry, storytelling, press, etc.) and, for some languages, by taking variation into account (dialectal, phonological or graphical variations). The second step is logically to enrich the corpora with annotations. The development of annotated corpora for regional languages raises many methodological issues. It is not always possible to directly transpose existing models for resource-rich languages, partly because of dialectal and phonological variation and the lack of writing standards. The corpora are also a basis for the development of dictionaries, lexicons and glossaries and are necessary for the description of the actual use of a language. On the other hand, dictionaries and lexicons are needed to support the development of the corpora (optical character recognition, lemmatization and morpho-syntactic analysis). When these resources already exist for a language (dictionaries, lexicons, bilingual glossaries coupling a regional and a national language), the question arises as to how information contained in these resources can be shared and possibly be enriched with additional annotations (phonetic, morphosyntactic, syntactic, etc.). Finally, corpora and lexicons are necessary for the development of natural language processing tools (morpho-syntactic analysis or syntactic analyzers, etc.).

Beyond the technical and methodological challenges, the more pragmatic difficulties related to the lack of financial and human resources to carry out the creation of resources should not be neglected. This workshop aims to bring together researchers involved in the creation of language resources and ‘basic’ NLP tools for French and European regional languages, in order to share their views, methodologies and techniques.