Abigail Walsh

Postdoctoral Researcher

Menu

Language Technology for Nordic Languages


What Lessons Can the Celtic Languages Learn?


November 27, 2024

A picture of the Dokkhuset building in Trondheim
Dokkhuset, Trondheim, where the conference took place
From the 5th to the 6th of November, I had the pleasure of attending a Language Technology conference in Trondheim, Norway. The conference focused on Language Technology (LT) for Nordic Languages, with talks exploring Norwegian, Swedish, Finnish, Danish, Icelandic, Sami, Greenlandic, and more, in a vibrant, multi-lingual forum.

Speakers presented in Norwegian, Swedish, and Danish, while I tried to keep up, employing Google's suite of translation tools to decode as much content as I could. Despite some creative mistranslations---Google seemed very sure that one speaker was speaking at length about ice-cream---I picked up on several interesting through lines. These points resonated strongly with me as a low-resource language LT researcher, and seemed very compatible with challenges and observations I have noticed for Irish LT.


1. Language technology is not developed in a vacuum
LT solutions should be grounded in the real-world needs of its users. Defining the problem space is a critical first step, ensuring that any tools developed are not only innovative but also useful and impactful. Language communities must lead the charge here, identifying their priorities to shape the evolution of LT development.

A good example of this is the community-driven approaches of LT for Meänkieli [1], Faroese [2], and Sápmi [3], which ensure that solutions are tailored to the cultural, linguistic and social nuances of their language users. In an education setting, initiatives for Greenlandic [4] and Danish [5] speakers have demonstrated how integrating existing knowledge into their LT solutions benefits students with learning difficulties. Irish LT should similarly be developed in collaboration with linguists, educators, researchers, and members of the public for technological progress and community requirements to stay aligned.


2. Regulation needs to support, not hinder, language technology
Effective regulation should balance protecting intellectual property with promoting innovation. Resource creators must be safeguarded, but overly restrictive policies can stifle the development of open-source tools and models, which are critical for advancing low-resource LT. Open access is particularly important in government and education domains, where LT solutions should serve the public good.

The National Library of Norway exemplifies this thinking, undertaking the enormous task of digitising and sharing their entire collection with the Norwegian public [6]. In doing so, they have further enabled the creation of Large Language Models and Generative AI models for the benefit of the public [7] [8]. Irish policymakers can similarly foster an environment of  innovation by supporting open-source initiatives and promoting existing resources, tools, and research initiatives. Such an approach also avoids locking useful tools and resources behind paywalls and forcing language users to shell out for commercial products.


3. Resource management is as crucial as resource creation
Creating datasets for low-resource languages is just the beginning. Proper management, cataloguing, and sharing of these resources maximise their impact and utility. European initiatives such as the European Language Grid Catalogue [9] and the European Language Data Space [10] have the potential to support under-represented languages, offering tailored guidance and infrastructure to manage resources sustainably, while also communicating directly with the language community.

Conferences such as this one also help increase the visibility of existing research and tools---one attendee even showed me an open-source toolkit for minority language LT support, which included an Irish module! [11] Centralised platforms such as Borealium [12] make it easy for language communities and researchers to access a variety of tools supporting many languages, dialects and variations. Continued reporting, such as the report on Less-Resourced Nordic Languages [13] is essential for identifying gaps in the field, and noting progress in key areas. Irish-focused initiatives such as eSTÓR [14] highlight existing resources, ensuring that valuable work is not duplicated or overlooked. A centralised repository for all Celtic LT resources would be a welcome addition... If anyone wants to get on that!


4. The virtuous circle of resource creation, expertise, and technological innovation
The creation of resources, training of LT experts, and development of technology are deeply interconnected, forming a symbiotic relationship that drives progress in LT. The availability of high-quality resources allows for the development of better tools and models, which in turn facilitates the expansion and enhancement of language resources. In parallel, experts trained in resource creation and evaluation techniques play a pivotal role in the development of more and better tools and resources.

For smaller language communities, multilingual tools and technologies can act as springboards for LT development, while adaptable, language-agnostic methodologies can be customised to meet specific language and community requirements. However, ensuring the quality and relevance of these bootstrap approaches requires a level of linguistic and cultural knowledge, and a deep understanding of the problem space, which LT experts alone may not be qualified to provide.

Once again, language communities must take an active role in evaluating and refining resources, working in close collaboration with linguistic and LT experts. By including the language community in the process, LT development encourages linguistic integrity, ethical practises, and authentic applications of developed resources. Initiatives such as the development of open models for Nordic languages [15] and the Icelandic Language Technology Programme [16] exemplify the positives of such an approach, demonstrating how collaborative efforts result in better solutions, and additionally foster an environment for continued LT work for this language.


A shared vision for low-resource language technology
The conversations taking place at this conference highlight the importance of cross-lingual collaboration and communication for tackling the big issues facing "small" languages today. While it is beneficial for languages in the same language family to engage in LT solutions together, clearly there are common ideas that extend beyond the borders of linguistic families or geographic regions. I hope that the Celtic and Nordic languages can continue this conversation, sharing strategies, insights, and breakthroughs, and promoting opportunities for growth in this digital age. Together, we can tackle some of the challenges faced by all low-resource languages, and ensure that the future is a multilingual one.

Takk!

References:
[1] Language technology for Meänkieli and other national minority languages - Rickard Domeij, Jacob Larsson, Elina Kangas (ISOF)
[2] With both sail and motor: new opportunities and challenges for Faroese language technology - Iben Debess, PhD fellow and coordinator (University of the Faroe Islands)
[3] NRK Sápmi's experiments and experiences - Jo Raknes (NRK Sápmi)
[4] Language technology and reading difficulties in Greenland - Beatrine Heilmann (The Language Secretariat of Greenland)
[5] Language technology as a tool to reduce the risk of dyslexia - Stine Fuglsang Engmose (University College Absalon) & Peter Juel Henrichsen (The Danish Language Council)
[6] Access to data: The role of the Nationa Library in building language technology, and challenges with copyrights - Magnus Breder Birkenes (National Library Norway)
Website: https://www.nb.no/en/
[7] New Norwegian models 6 months on: evaluation and experiences - Jon Atle Gulla, Professor (NTNU) and Director, Norwegian Research Center for AI Innovation (NorwAI)
[8] NORA.LLM: https://www.nora.ai/nora-conferences/cuttingedgeai/cuttingedge-ai-april-24/index.html
[9] https://live.european-language-grid.eu/
[10] https://language-data-space.ec.europa.eu/index_en
[11] https://github.com/giellalt
[12] Borealium.org a portal for language technology tools and aids for small, Nordic languages - Kristine Eide (Språkrådet) og Sjur Moshagen (Divvun, UiT)
Website: https://borealium.org/en/
[13] Report on Language Technology for Less-resourced Languages in the Nordics: Current State and Prospects - Steinþór Steingrímsson, The Árni Magnússon Institute for Icelandic Studies
[14] eSTÓR and more: Developing Irish language datasets to combat language inequality - Dr. Abigail Walsh, ADAPT Centre, Dublin City University
Website: https://estor.ie/ 
[15] Open models and the Nordic languages - Magnus Sahlgren, Head of Research, NLU (AI Sweden)
[16] An Insider's View: Looking Back at the Icelandic Language Technology Programme - Vésteinn Snæbjarnarson, PhD Fellow (University of Copenhagen)
A picture of a spectacular sunset over Havet Sauna, Trondheim
A spectacular sunset at Havet Sauna, Trondheim

Share

Tools
Translate to